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Abstract 


Recently, several new algorithms have been developed for the minimum cut problem that sub- 
stantially improve worst-case time bounds for the problem. These algorithms are very different 
from the earlier ones and from each other. We conduct an experimental evaluation of the relative 
performance of these algorithms. In the process, we develop heuristics and data structures that 
substantially improve practical performance of the algorithms. We also develop problem fami- 
lies for testing minimum cut algorithms. Our work leads to a better understanding of practical 
performance of the minimum cut algorithms and produces very efficient codes for the problem. 
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Chapter 1 


Introduction 


A minimum cut of an n-vertex, m-edge, capacitated, undirected graph is a partition of the vertices 
into two sets that minimizes the total capacity of edges with endpoints in different sets. This 
concept is more natural in pictures than in words; see Figure 1.1. 


Figure 1.1: The dashed line shows a minimum cut of this graph. 


Computation of minimum cuts is useful in various applications. An easy example is network 
reliability theory [37, 53]. If edges of a network fail with some probability, it makes intuitive sense 
that the greatest danger of network disconnection is at a minimum cut. Minimum cuts also arise 
in information retrieval [6], compilers for parallel languages [7], and cutting-plane algorithms for 
the Traveling Salesman Problem (TSP) [3]. We have also received requests for our codes from 
researchers interested in routing in ATM networks and computational biology. 
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1.1 Previous Work 


The problem of finding a minimum cut has a long history. It was originally considered a harder 
variant of the minimum s-t cut problem, which places the further restriction that designated 
vertices s and t be on opposite sides of the partition. The well-known max-flow—min-cut theo- 
rem [22, 21] implies that an s-t minimum cut can be found by computing an s-t maximum flow. 
In 1961, Gomory and Hu showed how to solve the minimum cut problem with n — 1 s-t mini- 
mum cut computations. Subsequently there was much progress in computing maximum flows, 
but no one has yet been able to prove a time bound better than O{nm) for any of the best algo- 
rithms [1, 9, 10, 25, 41]. Hence we cannot give a bound better than O(r2m) for the Gomory-Hu 
algorithm. 


Gomory-Hu stood as the best algorithm for the problem until 1989, when Nagamochi and 
Ibaraki [47] showed how to find a minimum cut without using maximum flows. Their algorithm 
(which we will call NI) runs in O(n(m-+nlogn}) time. In 1992, Hao and Orlin [29, 30] rejuvenated 
the flow approach by showing that clever modification of the Gomory-Hu algorithm implemented 
with a push-relabel maximum flow algorithm runs in time asymptotically equal to the time to 
compute one maximum flow: O(nm log(n?/m)). (We refer to this algorithm as HO.) 


Progress continued in 1993 with a randomized algorithm (KS) given by Karger and Stein [39, 
40] . With probability at least 1 — 1/n, it finds all minimum cuts in O(1¢ log? n) time. Finally, 
in 1996, Karger [38] gave two closely related algorithms (K). The first finds a minimum cut with 
probability at least 1 — 1/n and runs in O[ mlog? nm) time; the second finds all minimum cuts with 
probability at least 1 — 1/n and runs in O(n? log n) time. 


The recent burst of theoretical progress has outdated implementation experiments. In 1990, 
Padberg and Rinaldi published a study on practical implementation of Gomory-Hu [51], which 
is very valuable for the design of heuristics, but unfortunately came just before the theory break- 
throughs. Nagamochi et al [48] confirm that NI often beats Gomory-Hu in practice. Nothing is 
known about the practical performance of the other new algorithms. 


1.2 Our Contribution 


In this paper we address the question of the practical performance of minimum cut algorithms. We 
consider all of the contenders: NI, HO, KS, and K. Our goal is to obtain efficient implementations 
of all the algorithms and meaningful comparisons of their performance. Accomplishing this goal 
has two main aspects: obtaining good implementations and obtaining good tests. 


A major aspect of obtaining good implementations is making good use of heuristics. We intro- 
duce a new strategy for applying the heuristics of Padberg and Rinaldi, which turns out to be very 
important to the efficiency of our implementations. We give a modified version of KS that seems 
to be more effective. We also introduce a new heuristic for maximum-flow based cut algorithms. 


For both KS and K, which are Monte Carlo algorithms, guaranteeing correctness of our imple- 
mentations required using parameters from the theoretical analysis. Thus we rework the analysis 
for both of these algorithms to get the best constant factors, and end up having to prove new the- 
orems. In both cases it turns out that we believe there are stronger results than we can prove, so 
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the practical performance of both of these algorithms stands to be improved directly by further 
theoretical work. 


For K, the constants that we get from the theoretical analysis are unmanageably large, but we 
discovered that we always get the right answer with much smaller constants. Hence we cheat, 
and use a value that we cannot justify in our implementation. For this reason our implementation 
of K must be considered a heuristic—we do not have a proof of correctness. 


For many applications, HO appears to be the best algorithm, followed by NI, but overall, our 
tests show that no single algorithm dominates the others. In general, HO and NI dominate K and 
KS, but on one problem family both K and KS show better asymptotic performance than NI and 
HO. We also have problem families where HO is asymptotically better than NI and vice versa. 


Unfortunately, development and testing is iterative and interrelated. We develop tests that are 
hard for the implementations by looking for weaknesses in the implementations. Meanwhile, we 
use the tests to find weaknesses in the implementation and devise heuristics to improve perfor- 
mance. Thus it is difficult to be sure when we are done. For HO we take advantage of implemen- 
tation work on maximum flow algorithms [2, 13, 16, 17, 50]; for NI we take advantage of the work 
of Nagamochi et al. KS and K were both developed from scratch, so it remains possible that their 
inferior performance is due to the fact that we are the first to develop heuristics for them. 


Nevertheless, at the very least, we make significant progress in understanding how to imple- 
ment these algorithms, introduce new heuristics, and give an interesting set of problem genera- 
tors. 


Note that this paper represents joint work with Chandra Chekuri, Andrew Goldberg, David 
Karger, and Clifford Stein. A preliminary version appeared in SODA 97 [8]. 


The paper is organized as follows. In Chapter 2 we review the theory behind the minimum 
cut algorithms, including definitions, characterizations of the problem, and descriptions of the al- 
gorithms. In Chapter 3 we discuss general implementation issues and details of each algorithm in 
turn. In Chapter 4 we discuss our experiments, including descriptions of the problem generators 
and results. Note that some readers will want to skip certain portions of this paper. In particular, 
readers who are already familiar with the algorithms may want to skip most of chapter two, and 
readers who are interested primarily in the bottom line may wish to skip all the way to the results 
section. We give warnings at specific places in the text before particularly complicated and/or 
detailed discussions that many readers will likely want to skip. 


CHAPTER 1. INTRODUCTION 


Chapter 2 


Background 


In this chapter we discuss the theory behind minimum cuts. One of the reasons minimum cuts 
are so interesting to study is that the theory behind the different algorithms is so varied. First we 
review approaches based on a reduction of the problem to the maximum flow problem. Next we 
look at algorithms that identify edges that cannot be in the minimum cut and use that information 
to reduce the problem. We conclude with algorithms based on packing trees, a dual problem. 


We begin by introducing some terminology. 


Let G = (V,E,c) be an undirected graph with vertex set V, edge set E and non-negative real 
edge capacities c: E > Rt. Let n = |V|/ and m = |E|. We will denote an undirected edge with 
endpoints v and w by {v, w}, but use c({v, w) as shorthand for c({v, w}). A cut is a partition of the 
vertices into two nonempty sets A and A. The capacity or value of a cut c(A, A) is defined by 


f(A A)= YD cfu) (2.1) 


ucA,veA,{u,vieE 


We will sometimes unambiguously refer to a cut just by naming one side, and use the shorthand 
c(A) = c(A,A). Also, if A = {v}, we may use c(v) instead of c(A}. We refer to such cuts as 
trivial, and refer to the value c(v) as the capacity of v. The edges included in the sum in (2.1) will 
be referred to as edges in the cut or edges that cross the cut. A minimum cut of G is a cut A that 
minimizes c{A). We use A(G ) to denote the value of the minimum cut. 


Note that there may be more than one minimum cut. In fact, in a cycle where every edge has 
the same capacity, there are (5) of them. (The cycle is actually the worst case. This result is shown 
by Dinitz, Karzanov, and Lomonosov [18] and also follows easily from the correctness proof of 
KS.) Since we only look for one minimum cut, we sometimes fix one minimum cut and refer to it 
as “the minimum cut”. 


2.1 Flow Based Approaches 


The first approach to solving the minimum cut problem was based on a related problem, the 
minimum s-t cut problem. An s-t cut is a cut that has s and t on opposite sides of the partition. 
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The minimum s-t cut, A; ¢(G), is the s-t cut of minimum value. The well-known max-flow—min- 
cut theorem [22, 21] implies that a minimum s-t cut can be found by computing the maximum 
flow between s and t. In this section we discuss maximum flow and the minimum cut algorithms 
based on it. 


2.1.1 Definitions 


Although we study an undirected version of the minimum cut problem in this paper, flows are 
more natural in directed graphs. We transform an undirected graph into a directed graph in a 
standard way: replace each edge {v, w} by two arcs (v, w) and (w, v), each with the same capacity. 
A cut ina directed graph is defined as in an undirected graph, except that we only count the edges 
in one direction. That is, the capacity of a cut A in a directed graph is the sum of the edges crossing 
from A to A: 

c(A,A) = De c((u,v)) 


uUeA,veEA,(uvJEE 
In general, c( A, A) is not the same as c(A, A) in a directed graph, but since we replace each undi- 


rected edge with two directed edges, one in each direction, the value of every directed cut in the 
transformed graph is the same as the corresponding undirected cut in the original graph. 


Let G = (V,E, c) be a directed graph with two distinguished vertices s (source) and t (sink). A 
flow is a function f : E > R satisfying 


fiv,w)<eclv,w), Viv,w)eE (2.2) 

f(v,w)=—f(w,v), Viv,w) eE (2.3) 
> fvyw)=0, Vwe V—{s,th. (2.4) 
veV 


The first condition says that the flow on a directed edge is never more than the capacity of that 
edge. The second says that flow on an edge is antisymmetric: a units of flow on (u,v) implies —a 
units of flow on (v, uw). The final condition says that flow is conserved everywhere but the source 
and sink: the flow into each vertex is the same as the flow out of it. 


We define the residual capacity c;(v,w) of an edge (v, w) to be cs(v, w) = c(v, w) — f(v, w). The 
residual network G+ = (V, E+) is the network induced by the edges that have non-zero (i.e. positive) 
residual capacity. 


The value of a flow is the net flow into the sink, i.e., 


lf => f(y, t). 


veV 


It can be shown that the third condition, flow conservation, implies that this value is the same as 
the net flow out of the source. 


The maximum flow problem is to determine a flow f for which |f| is maximum. The max-flow-— 
min-cut theorem states that the value of the maximum s-t flow is equal to the value of the min- 
imum s-t cut, ie., |f| = As+(G). Therefore all the edges of a minimum s-t cut are used up to 
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capacity by a maximum s-t flow, and it can be shown that any s-t cut that is not minimum always 
has some edges with residual capacity. It follows that the vertices reachable from the source by 
edges in the residual network define an s-t minimum cut. An s-t maximum flow algorithm can 
thus be used to find an s-t minimum cut, and minimizing over all (5) possible choices of s and t 
yields a minimum cut. 


2.1.2 Push-Relabel Methods 


As this paper is primarily concerned with minimum cuts, not maximum flows, we do not wish 
to get too involved with maximum flow algorithms. Conveniently, however, the fastest current 
maximum flow algorithms and the Hao-Orlin minimum cut algorithm are both based on the push 
relabel method, so we review that method here. For a more detailed description see [26]. 


We begin with some additional definitions. The algorithm maintains a preflow, which is a 
relaxed version of a flow. A preflow satisfies conditions (2.2) and (2.3), and the following relaxation 
of condition (2.4): 


= f(v,w) > 0,Vwe V—{s, t} (2.5) 
veVv 


So a preflow only has “one-sided” flow conservation. Flow still may not be created at a vertex, 
but now it may be absorbed, because we allow more flow to enter than leave. We define the excess 
at vertex v with respect to preflow f by 


ew) => f(v,w) 


veV 


Given a preflow f, a distance labeling is a function d : V — WN that satisfies d(v) < d(w) + 1 for 
every (v,w) in the residual graph and d(s) — d(t) > n. The main point of this definition is that 
d(v) — d(w) is always a lower bound on the distance from v to w in the residual graph. In general 
we hope for these lower bounds to be close to correct, so that we can use a distance labeling to 
direct flow to the sink along short paths, which at least intuitively is a good thing to do. Another 
way to phrase this intuition is that a distance labeling gives a “locally consistent” estimate on the 
distance to the sink. The idea is that if we maintained exact distances we would be able to route 
flow to the sink on a shortest path, but then we would have to do a lot of work to update the 
labels. By relaxing the conditions on labels so that they are only lower bounds on distances, we 
attempt to get the benefit of having distances without doing so much work. Since the labels give 
lower bounds on distances, d(v)} > d(t) +n implies that t is not reachable from v in G;, because 
all paths have less than n edges. Thus the second condition, d(s) — d(t} > n, says that the sink is 
not reachable from the source, which means that some s-t cut is saturated, which means that if f is 
actually a flow then |f| is maximum. We say that an arc (v, w) € E is admissible if d{(v) = d(w) + 1. 
We say that a vertex v is active if the excess e;(v) > 0. 


Given a preflow f and a distance labeling d, we define push and relabel operations, which 
update f and d, respectively, as follows. The push operation applies to an admissible arc (v, w) 
where v is active; it increases flow on (v,w) by as much as possible: min(c¢;(v,w),es(v)). The 
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relabel operation applies to an active vertex v with no outgoing admissible arcs. It sets d(v) to the 
highest value allowed by the distance labeling constraints: one plus the smallest distance label of a 
vertex reachable from v via a residual arc. It is not hard to show that pushes and relabels preserve 
the validity of the distance labeling. 


Push((v, w)) 
(applies when (v, w) is a residual arc and d(v) = d(w) + 1) 
send min(c(v,w), excess(v)) units of flow along (v, w) 
remove (v, w) and/or add (w,v) to the residual graph if necessary 


Relabel(v) 
(applies when v has excess, is not the sink, and for all residual (v, w) has d(v) 4 d(w 
d(v) = — MiNyesidual (vw) )d(w ) afl 


The generic push-relabel algorithm for finding a minimum s-t cut starts by setting all distance 
labels to zero. Then the algorithm sets d(s) = 2n — 1! and saturates all arcs out of s. This action 
gives the initial preflow and distance labeling. The algorithm applies push and relabel operations 
in an arbitrary order. When no operation applies, the algorithm terminates. Since one of push or 
relabel will always apply at an active vertex, termination means that there are no active vertices, 
which means that we have a flow. By the arguments above, this flow is maximum. 


GenericPushRelabel(G, s, t) 
for all v, d(v} H 0 
d(s}) + 2n—1 


saturate all arcs out of s 
while there exists a vertex with excess 

find a place to apply a Push or Relabel and do so 
return excess at t 


We get time bounds by counting the number of push and relabel operations. It is easy to 
show at distance labels only increase and are always O(n), from which it follows that there can 
be only n? relabels. Each relabel requires looking at the outgoing edges of a vertex, so the total 
relabeling time is }_,, O(n)degree(v) = O(nm). We account for pushes by distinguishing pushes 
that saturate, that is, use all the residual capacity of a residual arc, and those that do not. After a 
saturating push on an arc, it is no longer part of the residual graph, and it cannot return to the 
residual graph until there is a push on the reverse arc. But for the reverse arc to be admissible, 
we must relabel one endpoint. Thus there can be only O(n) saturating pushes per arc, giving a 
total of O(nm) saturating pushes. It remains to bound the number of non-saturating pushes. It is 
possible to give a generic bound of O(n*m) on the number of non-saturating pushes, but we can 
get better bounds by considering variations on the algorithm. 


At a high level, push-relabel algorithms differ by the order in which they apply push and re- 
label operations. One convenient way of ordering the operations is to define a discharge operation, 
which combines the push and relabel operations at a low level. The discharge operation applies 
to an active vertex v. The operation applies push operations to arcs out of v and relabel operations 
to v until v is no longer active. 


For an s-t cut computation, we can set d(s} = n; the higher value is needed for the Hao-Orlin algorithm. 
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Discharge(v) 
(applies when v has excess) 
while v has excess 


if v.currentArc = NIL, Relabel(v) 
else Push(v.currentArc}) 


We can now specify an ordering of pushes and relabels by giving a strategy for selecting the 
next active vertex to discharge. One possibility is the highest label strategy: discharge an active 
vertex with the highest distance label. This strategy, in combination with appropriate heuristics, 
seems to give the best results in practice [13]. It also permits a better bound on the number of 
non-saturating push operations: O(r?./m) [11]. 


HighestLabelPushRelabel(G, s, t) 
for all v, d(v} — 0 
d(s}) + 2n—1 


saturate all arcs out of s 
while there exists an active vertex 
Discharge(an active vertex with maximum distance label) 


We now sketch the proof of this time bound. Call the time between successive relabels a phase. 
At the end of a phase, for any vertex that has excess that was moved during the phase, we can 
identify a “trajectory” of non-saturating pushes that contributed to the excess. These trajectories 
end either at an edge that had a saturating push, or a vertex that had done no pushes since it was 
last relabeled. The key observation is that a non-saturating push from a vertex with the highest 
label makes that vertex inactive and, since there are no higher labeled vertices that could push to 
it, it must stay inactive at least until a relabel occurs. Thus these trajectories are vertex disjoint. 
So there can only be \/m trajectories longer than n/\/m in a phase, totaling l/m non-saturating 
pushes for a phase in which the maximum distance label (over active vertices) drops by lL. But the 
total increase in distance labels is only O (n2), so the total decrease is the same, giving a bound of 
O(n*,/m) on the number of non-saturating pushes in trajectories longer than nA/m. And since 
each trajectory ends an edge that had a saturating push or a vertex that was newly relabeled, 
the total number of trajectories is only O(nm), so the total number of non-saturating pushes in 
trajectories shorter than n/,/m is also only O( meat): 


Another possibility is to use a FIFO queue to order discharge operations. In conjunction with 
dynamic trees, a sophisticated data-structure that makes it possible to do many non-saturating 
pushes at once, this method gives the best known time bound: O(nm log(r2/m)) [26]. We did not 
implement this version, and the full description is rather involved, so we do not give it here. 


We assume that a relabel operation always uses the gap relabeling heuristic [12, 16]. This heuris- 
tic often speeds up push-relabel algorithms for the maximum flow problem [2, 13, 16, 50] and is 
essential for the analysis of the Hao-Orlin algorithm. Gap relabeling is based on the observation is 
that if there is no vertex with distance label x, then no excess at a vertex with distance label greater 
than x can reach the sink. (Consider applying discharge operations to these vertices before apply- 
ing discharge operations to any of the other vertices. Since there is no vertex with distance label x, 
it will never be possible to push any excess to a vertex with label less than x. It follows that all this 
excess must return to the source.) We exploit this observation as follows. Just before relabeling 
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v, check if any other vertex has label d(v). If the answer is yes, then relabel v. Otherwise, delete 
all vertices with distance label greater than d(v). Note we have written the above as it applies to 
finding minimum cuts; if we actually want the flow then we cannot delete active vertices, but we 
can assign label d(s) to all the vertices with label greater than d(v), which is still helpful. 


GapRelabel(v) 
(conditions for Relabel apply) 


if vis the only vertex with distance label d(v) 
remove all w with d(w) > d(v) 
else Relabel(v) 


2.1.3. The Gomory-Hu Algorithm 


In 1961, Gomory and Hu [27] showed that A;.(G) for all (5) pairs of s and t could actually be 
computed using only n — 1 maximum flow computations. Their method immediately yields an 
algorithm for computing minimum cuts using only O(n) maximum flow computations. Note that 
since Gomory and Hu considered directed graphs and actually compute all A, .(G), they solve a 
more general problem. 


We can see more directly that O(n) maximum flow computations suffice to compute a mini- 
mum cut. Fix some vertex s arbitrarily. In the minimum cut, there is some vertex t on the other 
side of the partition. For this t, the s-t minimum cut is clearly the same as the minimum cut. There- 
fore we can find the minimum cut by finding the minimum (over t) of minimum s-t cuts. This 
algorithm computes a minimum cut with n — 1 minimum s-t cut computations. For the purposes 
of later discussion, we refer to this simplified algorithm as GH. 


2.1.4 The Hao-Orlin Algorithm 


A natural question to ask about GH is whether some of the information computed in one maxi- 
mum flow computation can be reused in the next one. Hao and Orlin answer this question in the 
affirmative. The key new idea is to use a push-relabel maximum flow algorithm to implement 
GH, and use the preflow and distance labeling from the last max-flow computation as a starting 
point for the current one. This method allows us to amortize the work of the (n — 1) s-t cut com- 
putations to obtain a worst-case time bound that is asymptotically the same as the bound for one 
maximum flow computation. We give a brief description of this algorithm below. See [31] for 
details. Note that the algorithm given by Hao and Orlin applies to directed graphs, as did the 
original Gomory-Hu algorithm. As with GH, we ignore those details in this discussion. 


A key concept of the Hao-Orlin algorithm is that of a sleeping layer of vertices. A sleeping layer 
is a set of vertices that do not participate in the current flow computation; there can be multiple 
such layers. A vertex is asleep if it belongs to a sleeping layer and awake otherwise. Initially all 
vertices are awake. When gap relabeling discovers a set of vertices disconnected from the sink, 
these vertices form a new sleeping layer. This layer is deleted from the graph and put on a stack 
of layers. When a layer of vertices is put to sleep, the values of the vertex distance labels are the 
same as they were just before the relabeling operation during which the layer was discovered. At 
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some point during the execution of the algorithm, the top layer will be popped from the stack and 
the vertices of this layer will become awake. The point of the sleeping layers is that at the time 
we find them they are not relevant to the current flow computation, but we have done work to 
get their distance labels to the current state, so we save this information for use in a later flow 
computation. 


The Hao-Orlin algorithm starts as follows. We select the first source and sink arbitrarily. We 
set the distance label of the source to 2n — 1 and saturate all arcs out of the source. Distance labels 
of all other vertices are set to zero. Then we start the first s-t cut computation. After an s-t cut 
computation terminates, we examine the cut it finds and remember the cut if its capacity is smaller 
than that of the best cut we have seen so far. Then we start the next computation as follows. First 
we set the distance label of t to 2n — 1 and saturate all of its outgoing arcs. This effectively makes 
it part of the source, so we refer to such vertices as source vertices. Next we look for a new sink. 
If there are no non-source, awake vertices, we awaken the top sleeping layer. We now pick the 
non-source, awake vertex with the smallest distance label as the new sink. If we cannot find a new 
sink because there are no non-source, awake vertices and there are no more sleeping layers, then 
all vertices are source vertices and we are done. 


HO(G) 
RH 00 
designate some vertex s, give it label 2n — 1, and saturate all of its outgoing arcs 
while there are non-source vertices 
if there are no awake vertices, awaken the top sleeping layer 
pick the awake vertex with minimum distance label as t 


PushRelabel(G, s, t) (always using GapRelabel, not Relabel) 
if the excess at t is less than X 
X & excess at t 
designate t a source vertex, and saturate all of its outgoing edges 
return A 


It is not hard to check that the distance labels remain valid throughout the computation, which 
implies the correctness of the algorithm. Likewise, as in the maximum flow context, the distance 
labels are O(n) and only increase. It follows that using highest label selection, the time bound for 
HO is O(n*,\/m). The proof for FIFO selection with dynamic trees also carries over, giving a time 
bound of O(nm log(n2/m)). 


2.2 Contraction 


Another way to approach the minimum cut problem is to try to identify vertices that are on the 
same side of the minimum cut. Given two such vertices, we would like to reduce the problem. 
This motivates the following definition: 


Given a graph G and vertices v and w, we create G/{v, w}, the contraction of v and w, by merg- 
ing v and w into one node. That is, v and w cease to be discernible vertices; there is only a node 
representing the two of them, which has as its neighbors the union of the neighbors of v and the 
neighbors of w. If {v, w} € E, we often refer to the contraction of v and w as the contraction of edge 
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{v,w}. Multiple edges are preserved by this operation, at least in terms of capacity. That is, if v and 
w havea common neighbor u, then in G/{v, w} either {vw, u} has capacity c(v, u) + c(w, u) or there 
are two edges {vw, uj, one with capacity c(v,u) and the other with capacity c(w,u). These two 
views are equivalent in theory; which paradigm to use when representing the graph in practice is 
an implementation detail. 


Note that although the terms node and vertex are often used interchangeably, we make a dis- 
tinction for the purposes of talking about contracted graphs. We use the term node for the base 
set of a graph potentially created by a contraction operation, and the term vertex for the input. So 
after any series of contractions, nodes correspond to sets of vertices of the input graph. 


The key property that we want from the contraction operation is captured in the following 
two lemmas: 


Lemma 2.2.1 Given a network G and two nodes v and w, if v and w are on the same side of some minimum 
cut, then A(G) = A(G/{v, w}). 


Proof. The proof is immediate from the fact that no new cuts are created by contraction and that 
by assumption all the edges of some minimum cut are not contracted. = 


Lemma 2.2.2 Given a network G and two nodes v and w, X(G) = min{A(G/{v, w}), Ayw(G)}- 


Proof. If v and w are on the same side of some minimum cut, then A(G) = A{G/{v,w}) by 
Lemma 2.2.1. Otherwise v and w are on opposite sides of every minimum cut, so A(G) = Aw(G) 
by definition. My 


So given two vertices on the same side of any minimum cut, contraction produces a smaller 
graph with the same minimum cut. Further, given a minimum v-w cut, we can find a minimum 
cut by taking the smaller of the v-w cut we have and the minimum cut of G/{v,w}. From these 
observations we immediately get a high level minimum cut algorithm: 


GenericContractCut(G) 
Xe 0 


while G has more than one node 
Either 


1. identify an edge {v, w} that is not in some minimum cut 
2. compute Ay »(G) for some v and w and set A= min{Ayw(G), A} 
Ge G/{v, w} 
return A 


Since a contraction reduces the number of nodes by one, this algorithm requires n—1 iterations 
of the while loop. 


We assume that our algorithms always keep track of the minimum cut seen so far, as in Gener- 
icContractCut, so we refer to an edge {v, w} as contractible if it is not in some minimum cut or if we 
already know Ay w(G). 


Note that GH can be modified to fit in this framework, because a maximum flow computation 
identifies a contractible edge (always option 2). Thus we use n — 1 flow computations. Likewise 
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in HO, we can contract the source and the sink at the end of each flow computation. Actually, 
HO already does contractions implicitly by designation of source vertices, but for the purposes of 
adding heuristics it turns out to be desirable to think about doing the contractions explicitly. 


In the remainder of this section we describe several other ways to identify contractible edges. 
First we discuss local tests for contractibility given by Padberg and Rinaldi. These do not always 
apply, so they do not result in a minimum cut algorithm, but they are excellent heuristics. Then 
we discuss an algorithm of Nagamochi and Ibaraki that identifies at least one contractible edge by 
a graph search. Finally we discuss an algorithm of Karger and Stein, which shows that “guessing” 
contractible edges is good enough for a high probability of success. 


2.2.1 The Padberg-Rinaldi Heuristics 


In their implementation study of minimum cut algorithms, [51], Padberg and Rinaldi introduced 
several local tests for identifying contractible edges. The point is to try to take option 1 of Gener- 
icContractCut whenever possible. Used in GH, every time a test finds a contractible edge, we save 
one maximum flow computation. If the tests do not identify any contractible edges, then we have 
no choice but to use the flow computation. Since maximum flow computations are expensive, fast 
tests for contractibility are a big win—even if sometimes they do not apply—because we do not 
lose much if they fail and we gain a lot if they pass. 


In their paper, Padberg and Rinaldi give a very general class of tests. Some of these would 
be quite time consuming, and in fact, could dominate the running time of the new minimum cut 
algorithms. We single out the four cheapest tests, which are reasonable to use. We refer to these 
as PR tests or PR heuristics. We say that a test passes if one of the conditions is satisfied, which 
implies that the edge is contractible. Recall for the following formulas that c(v) denotes the total 
capacity incident to vertex v. 


Lemma 2.2.3 [51] Let X be an upper bound on X(G). If v,w € V satisfy any of the following conditions: 


PR2 c(v) < 2c(v,w) 


PR3 dusuch that c(v) < 2(clv,w)4+ clv, u)) and c(w) < 2(clv,w)+ clw,u)), 
PR4 clv,w) +), min(c(v, u), c(w, w)) > N 
then one of the following conditions must hold: 

1. vand ware on the same side of some minimum cut. 

2. {v} isa minimum cut. 

3. {w} is a minimum cut. 


4, There is only one minimum cut and {v, w} is the only edge that crosses it. 


18 CHAPTER 2. BACKGROUND 


5. There is only one minimum cut and the edges whose capacities are included in the sum for test PR4 
are the only edges that cross it. 


Written mathematically, these tests are difficult to interpret, but they are actually fairly intu- 
itive. This intuition comes out best in the proof, so we give that now. 


Proof. PR1 says that if we have an edge with capacity greater than an upper bound on the min- 
imum cut value, then it is not in some minimum cut. This result is immediate from the fact that 
the value of any cut including {v, w} is at least c(v, w). If the capacity and bound are equal, then a 
minimum cut that includes {v, w} can have no other edges. Thus if there is another minimum cut 
it cannot include {v, w}, and otherwise we have that condition 4 holds. 


wet 


Figure 2.1: PR2: if c(v,w) > c(v)/2, then the cut on the right is no bigger than the cut on the left. 


PR2 says that if we have an edge {v, w} with capacity at least half the capacity of v, then either 
it is not in some minimum cut or {v} is a minimum cut. To see this, consider such an edge (see 
Figure 2.1). If it does not cross some minimum cut then condition 1 holds, so suppose it crosses 
every minimum cut. Fix one of the minimum cuts. What happens if we move v to the other side 
of the vertex partition? The cut value loses c(v, w) and may gain as much as c(v) —c(v, w). But the 
second quantity is at most c(v,w)],so we cannot make the cut value larger, which contradicts the 
assumption that {v, w} crosses every minimum cut. Of course, if {v} is the minimum cut, then we 
cannot move v across the partition, but that is the only other possibility. Note that by symmetry, 
PR2 also passes if c(w) < c(v, w). 


PR3 is a more complicated form of PR2. Again, consider an edge where the conditions hold. 
Suppose u is on the same side of some minimum cut as v. Then by Lemma 2.2.1 we can consider 
G/{u, v}, because it has the same cut value. This merges {u, w} and {v, w}, so the assumption that 
c(w) < 2(c(v, w) +c(w, u)) implies that PR2 applies at w in the contracted graph. Now suppose u 
is not on the same side of the minimum cut as v; that is, it is on the same side as w. By symmetry, 
the same argument applies. So either way, {v, w} does not cross some minimum cut or one of {v} 
or {w} is a minimum cut. (See Figure 2.2) 


PR4 is a generalization of PR1. We consider {v, w} and all of the length two paths between v 
and w (see Figure 2.3). Any cut separating v and w must include {v, w} and at least one edge from 
each of the paths of length two. The test is computing the minimum value that this quantity can 
have; clearly if it is greater than an upper bound on the cut value, then v and w cannot be on the 
same side of any minimum cut. And as in PR1, if the test is met with equality it is possible that the 
edges we have summed over are the only edges of the unique minimum cut, which means that 
case 5 holds. Note that any edge that passes PR1 will pass PR4, but we distinguish the two tests 
because PR1 is cheaper to compute. = 


We still have not argued that these tests can help us. Observe first that conditions 4 and 5 are 
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Figure 2.2: PR3: if c(v,w)+c(v,u) > c(v)/2 and c(v, w) + c(w, uw) > c(w)/2, then for either case of 
a cut containing {v, w} (on the left), there is a cut that does not contain {v, w} and is no bigger (on 
the right). 


Figure 2.3: PR4: any cut separating v and w must also cut one edge from each of the paths of 
length two. 
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technicalities that will not concern us. In order to haveX set to A(G) we must have already found a 

minimum cut, so there is no question of missing the unique minimum cut by ignoring possibilities 
4 and 5. Thus when {v, w} passes a PR test, it either is not in some minimum cut, or one of {v} or 
{w} is aminimum cut. So as long as we check c(v) and c(w) and updateA if appropriate, {v, w} is 

contractible if it passes a PR test. 


Note that we need to be more careful if we want to find all minimum cuts. In that case we 
would want condition 1 strengthened to a guarantee that {v, w} is not in any minimum cut. It is 
easy to show that we can get that condition by making the inequalities in the tests strict. Likewise, 
if we are interested in finding near-minimum cuts, we must relax the tests further. In particular, 
if we want to find all cuts with value at most xA(G), we must introduce « into the inequalities of 
the tests. This detail is only an issue for KS and K, which are capable of finding all minimum cuts 
in the same time bounds, and can be extended to find near-minimum cuts. 


It is not hard to see that it is possible to have a graph where none of these tests apply. Consider 
an uncapacitated graph with minimum cut at least 2 (PR1 fails), where each vertex has degree at 
least 3 (PR2 fails), and there are no triangles (PR3 and PRé4 fail). An example is a cycle on n — 2 
nodes with an extra vertex connected to every other node on the cycle and another extra vertex 
connected to the remaining nodes (see Figure 2.4). We call this graph a bicycle wheel (because of 
the heavy rim and light, interleaved spokes), and in fact test our codes on it. 


Figure 2.4: A graph on which all PR tests fail. 


2.2.2 The Nagamochi-Ibaraki Algorithm 


Other than by computing maximum flows, we still have not given a guaranteed way to identify 
a contractible edge. As discussed above, given a subroutine to find a contractible edge, we get a 
minimum cut algorithm: repeatedly find a contractible edge and contract it until the graph has 
only two vertices. This method requires n — 2 calls to the subroutine. So the question is whether 
there is a fast such subroutine. Nagamochi and Ibaraki [47] show that there is a surprisingly 
simple one. 


For the purposes of intuition, consider uncapacitated multigraphs. Suppose we have a graph 
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with minimum cut value one. Then the graph is connected, so we can find a spanning tree. For 
every minimum cut, the one edge of the cut must be in the tree, or else it is not spanning. Thus all 
the minimum cut edges are included in this spanning tree. Any other edges are not in any mini- 
mum cut, and are therefore contractible. Generalizing this idea, we define a sparse k-connectivity 
certificate to be a subgraph with at most kn edges in which the value of any cut is the minimum of 
k and the cut’s original value. So for k > A, if we find a sparse k-connectivity certificate, any edge 
not in the certificate is contractible. Further, as suggested by the example above, we can find such 
a subgraph by repeatedly (k times) finding a maximal spanning forest and removing those edges 
from further consideration. 


There are still two problems with this idea. First, in a capacitated graph, it is not efficient 
to repeat any step A times. Second, it is not guaranteed that we find a contractible edge, as the 
certificate may contain all of the original edges. 


Nagamochi and Ibaraki solve these two problems simultaneously. They use a graph search 
called scan-first search that finds all of the maximal spanning forests in one pass over the graph, 
and they show that this search also finds a minimum v-w cut for some v and w, which guarantees 
that we will be able to do at least one contraction. Note that it is possible to invert this perspective. 
We can say that Nagamochi and Ibaraki gave an algorithm that efficiently finds a minimum v-w 
cut and also happens to find a sparse connectivity certificate, which provides a good heuristic for 
obtaining more contractions. It is a mistake, however, to proceed to ignore this heuristic, as it 
sometimes allows NI to finish in one search, instead of n — 1. 


In more detail, we build the maximal spanning forests by visiting the vertices in some order. 
When we visit a vertex, we assign to appropriate spanning forests all unassigned incident edges. 
In order to assign edges to forests, we keep track for each vertex of the maximum tree to which 
any incident has been assigned (r(v)). Let E, be the edges of the ith maximal spanning forest. The 
remarkable theorem proved by Nagamochi and Ibaraki is that if we always pick the vertex with 
maximum r(v)} to visit next, and assign incident edge {v, w} to E.(,)41, then we get the desired 
spanning forests. 


ScanFirstSearch(G) 
for eachv € V 
r(v) — 0 
mark v unvisited 


for eache € E 
mark e unassigned 
Ex; EE, +: Ege 0 


while there is an unvisited node 
v < the unvisited node with largest r(v) 
for each unassigned {v, w} 
Eequie— Erauins Cia) 
if (r(v) =r(w)) r(v) KH r(v) +1 
riw) GH r(w) +1 
Mark {v, w} assigned 
Mark v visited 


We now sketch the proof that scan-first search finds the desired forests. First, note that we do 
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indeed maintain the property that r(w) is the maximum forest in which w has an incident edge. It 
follows that by adding {v, w} to E,(,)41, we can never create a cycle, so we do indeed find forests. 
It remains to argue that when we add {v, w} to forest i+ 1, there is a path from v to w in forest i. (If 
this statement is true, we are always adding each edge to the first forest in which it does not create 
a cycle, which implies that none of the previous forests could have been made larger by adding 
this edge, which means that we are properly simulating the idea of repeatedly finding maximal 
spanning forests and deleting them.) When assigning an edge to a tree, consider also directing it 
from the vertex being visited to the unvisited vertex. From the way we assign edges to forests, it is 
clear that each vertex has in-degree at most one in a given forest. Further, if we have two distinct 
trees in a given forest, each with at least one edge, then they cannot become connected, because 
the roots have already been visited, so there are no more unassigned incident edges to add there. 
Now, when we add {v, w} to forest i + 1, we have r(v) and r(w) each at least i, so both of v and 
w have an incident edge in forest i. Assume for contradiction that there is no path between v and 
w in forest i, in which case they are in distinct trees. Let (wo) be the root of the tree containing 
v(w), and assume without loss of generality that vo is visited before wo. Let vo,v1,..., Vn = v be 
the unique path from vo to v in the tree containing v. After visiting vo, r(v1) is at least i, and r(wo) 

is at most i— 1, since we assume Wp is visited second. So we now visit v; before wo, because it has 
a larger r. Repeating this argument, it follows that we visit v before we visit wo, but in order for 
Wo to be the root of the tree containing w, it must be visited before v, so we have a contradiction. 


We now direct our attention to the second claim, that scan-first search also finds a minimum 
v-w cut for some v and w. Consider the last edge assigned. This edge necessarily connects the 
next-to-last vertex visited (v) to the last vertex visited (w). Each edge of w is clearly added to a 
different forest, so the {v, w} edge is assigned to E,(y,). But this means that any cut containing {v, w} 
has value at least c{w), or else we could have put the edge in an earlier forest. In particular, this 
means that Ay w > c(w). Since A, cannot be more than c(w), we have that in fact A, = c(w). 


Thus we always compute A, (G) for some v and w and can therefore always take option 2 
of GenericContractCut. Note that unlike GH, NI does not pick v and w, it just finds the minimum 
v-w cut for some v and w. Further, it is possible that we find many edges to which we can apply 
option 1 of GenericContractCut. 


Using an appropriate priority queue to pick the next vertex to scan, the search runs in O(m + 
nlog n) time; thus the total time of NI is O(n(m-+ nlogn)}. Note that we have not talked about 
capacitated graphs; but we do not need to explicitly construct the E,, so for integer weights there 
is no problem. In fact the proofs carry over for real weights as well. See Section 3.4 for discussion 
of what is actually implemented. 


Matula’s 2 + e« Approximation Algorithm 


Recall that NI only guarantees a reduction of one node per search. This situation arises because we 
do not want to make a mistake. However, if we are willing to settle for an answer that is within 
a constant factor of the minimum cut, we can guarantee that the number of edges is reduced by a 
constant factor with each search. This result is due to Matula [46]. The algorithm is as follows: 
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MatulaApprox(G, e) 
compute d, the minimum over v of c({v) 


compute a sparse (z¢_)-connectivity certificate and contract all of the non-certificate edges 


2+e 
recurse 


return the smaller of d and the result of the recursive call 


This algorithm gives an answer within 2+ € of the minimum cut. The argument that this result 
holds is easy. If the minimum cut is less than d/(2 + €), then the sparse certificate contains all of 
the minimum cut edges, so the graph given to the recursive call will have the same minimum cut. 
When the minimum cut is more than d/(2 + €), since the minimum cut is at most d, disa2+¢€ 
approximation. 


We now argue the running time, assuming that the total edge capacity is bounded by a poly- 
nomial in n; the results can be extended to arbitrary capacities [35]. Since the original graph had 
dn/2 edges, and the certificate has only dn/(2 + e€), an O(e) fraction of the edges must be elim- 
inated at each step. It follows immediately that the running time is O(m/e) for uncapacitated 
graphs, or O(m(logn)/e) for capacitated graphs with total edge weight polynomial in n. 


2.2.3 The Karger-Stein Algorithm 


Karger and Stein take a very different approach to pick edges to contract. Rather than finding 
a contractible edge, they just pick an edge at random and contract it. The intuition behind this 
surprising action is that relatively few edges cross any given minimum cut (that is what makes 
the cut minimum), so there is a reasonable chance that the edge is in fact contractible. 


In order to explain this algorithm, we start by describing a simpler algorithm of Karger [33], 
which shows one of the key ideas. Consider our capacitated graph as an uncapacitated graph with 
multiple edges to represent capacitated edges. Pick an edge at random. Clearly the probability 
it is in the minimum cut is only A/m. Further, since each edge has unit capacity, each node must 
have at least incident edges, so m > An/2. (If v had fewer incident edges, {v} would be a smaller 
cut.) So the probability of picking a minimum cut edge is at most>45 = 2/n. We say a minimum 
cut survives a contraction if no edge that crosses it is contracted. It follows that the probability that 
a given minimum cut survives k successive contractions (being contracted down to n — k nodes) is 
at least 


2 2 2 ee ee ee) 
Se ea) Re onl ee ae 


In particular, if we repeatedly contract random edges until the graph has two nodes, the minimum 
survives with probability at least 1/(5). It follows that we can repeat this algorithm O(n? log n) 
times to find a minimum cut with probability at least 1 — 1/n. This algorithm also works for 
capacitated graphs; the only modification we need to make is that edges should be picked for 
contraction with probability proportional to capacity. 


Unfortunately, the above algorithm does not run very fast. One iteration of a sequence of 
n—1 contractions can take O(n) time; O(n? log n) iterations can take O(n‘ log n) time. However, 
Karger and Stein [39] point out that the highest probability of failure is when the graph is small. 
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In fact, if we contract down to n/\/2 nodes, Equation 2.6 says that the probability the minimum 
cut survives is at least one half. Thus instead of contracting down to two nodes, it makes sense to 
use a recursive approach: 


RecursiveContract(G) 
if G has less than 7 vertices, compute the minimum cut by brute force and return it 
repeat twice: 


contract down to [1 + n/V2] nodes, giving G’ 
RecursiveContract(G’) 
return the minimum of the two answers from 2b 


Note that the base case is not n = 2 for technical reasons: when n is less than 7, [1 + n/V2] is 
bigger than n. 


Theorem 2.2.4 [39] The recursive contraction algorithm runs in O{ n? log n) time and finds the minimum 
cut with probability O(1/ log n). 


It follows that O(log? n) repetitions run in O(n2 log? n) time and find a minimum cut with 
probability at least 1 — 1/n. Note that since the probability of success holds for any minimum cut, 
the algorithm actually finds all of them with high probability. 


KS(G) 
repeat RecursiveContract(G) O(log’ n) times and return the smallest cut seen. 


Notice that the constant in the O, the number of iterations we must run, depends on the exact 
analysis of the success probability of the recursive contraction algorithm. This point will cause us 
some trouble in the next section, when we modify the algorithm. 


A New Variant of KS 


We note that KS makes the pessimistic assumption that the total edge capacity when n nodes 
remained was nA/2. Under this assumption, contraction to less than n/\/2 nodes might not pre- 
serve the minimum cut with probability at least 1/2. Consider, however, the case of two cliques 
joined by a single edge. In this case, the original algorithm is being overly conservative in con- 
tracting to only n/V/2 nodes. It could in fact contract to a far smaller graph while still preserving 
the minimum cut with reasonable probability. 


We give a variant of the recursive contraction algorithm that has better behavior in this respect. 
Unfortunately, we have been unable to prove that it does not have worse behavior when the graph 
really does have only nA/2 edges. In some sense it is not terrible that we cannot prove it is never 
worse, as we hope that our experiments would reveal such a problem. However, our experiments 
cannot be exhaustive, so it would be nice to know that there is not a bad graph we did not think of. 
Even worse, it turns out that a tiny change to this variant would cause it to have infinite expected 
running time, so we do need to at least show a polynomial time bound, even though we cannot 
show the O(n? log? n) time bound that we would like. Note that we also must carefully analyze 
the success probability, as that is the only way to guarantee the correctness of our implementation. 
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We now describe the variant. As in the PR tests,X is an upper bound on the minimum cut. 


NewRecursiveContract(G) 
compute min, c(v), and update 4 if appropriate 
if G has only two nodes, return A 
repeat twice: 


mark each edge {v, w} independently with probability 1 — 2~ c(vw)/A 
contract all marked edges, giving G’ 
NewRecursiveContract(G’) 

return A 


There are two main differences to explain. First, we do not need to stop at 7 nodes, because 
we do not get stuck there any more. (Recall that the old algorithm got stuck there because it tried 
to contract to a specific size, and that size was not smaller than n when n was less than 7. Now 
we mark edges, so we freely contract down to one.) Notice also that we do not even bother to stop 
at 2, which seems the natural stopping point, because the first step updatesA to the minimum cut 
of a two node graph. The second change is the way we pick edges for contraction. To see why 
the new method makes sense, again consider the probability that the minimum cut survives the 
contractions. Let c;,¢2,...c; denote the capacities of the minimum cut edges. The probability the 
minimum cut survives is 


g-e1/K9-€2/K sk eo = a-Leaifh = a-V/R 


Since A is an upper bound on A, this probability is always at least 1/2. 


So the new algorithm preserves the minimum cut with the same probability, but it may con- 
tract more edges. For example, given two cliques joined by a single edge, ifX is close to A (= 1), the 
new algorithm will contract many more edges, reducing the depth of the recursion. Note, how- 
ever, that we must be careful, because if X is very far from A, the probability of contraction will 
be very small and the recursion depth and running time could get very large. Thus by this mod- 
ification we hope to do more contractions when there are excess edges and reduce the recursion 
depth, but in the process we introduce the risk of not contracting enough edges and increasing the 
recursion depth. So we must be careful. 


We resolve this problem by noticing that we have a convenient upper bound on A in the form 
of the minimum degree of the input graph. If we use this upper bound, then the probability that 
any given vertex v is not involved in a contraction is at most 2-<v)/*, which is at most 1/2. Thus 
we expect at least half of the nodes to be involved in a contraction during each execution of step 
(b), which means that we expect the contracted graph to have only 3n/4 nodes. Since we can 
do the O(n) contractions in O(m) time, if we actually reduced to 3n/4 nodes each time then the 
recurrence for the running time would be 


T(n) = O(n’) + 2T(3n/4) = O(n'?84/3*) & O(n?) 
Unfortunately, we are not guaranteed such a reduction, so we cannot use this recurrence relation. 


Nevertheless, we conjecture that the new algorithm’s performance is in fact equal to that of the 
old one, but this remains to be proved. 


It is tempting to stop worrying about the running time now, because we are implementing the 
algorithm, so we are interested more in what it actually does than the best bound we can prove, 
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but it turns out that we must be very careful. Recall that in the original algorithm, the minimum 
cut survived to depth k of the recursion with probability O(1/k). That was fine originally, since 
the depth was known to be finite, but now we must be careful. Suppose we made our base case 
one node, instead of two. Now we do not terminate until the minimum cut is contracted, so the 
probability we do not terminate at depth k is O(1/k). Hence the expected depth of the recursion 
is infinite! 

We will now argue that when the base case is two nodes, the running time is in fact polynomial. 
From above, we can say that the probability a given node is not contracted away is at most 1/4. 
Thus a Chernoff bound tells us that when k nodes remain, the probability that we do not contract 
away (1 — e)k/4 of them is at most e-€’k/8_ So while k = O(log n), with high probability we do 
get a constant factor reduction at each step, so with high probability the recursion only descends 
to depth O(log n) before the graph is down to c log n nodes. 


After this point, since the graph has at least three nodes before termination, we have total edge 
weight at least 3A/2. Thus any time we do random contractions, the probability that all edges 
survive is at most 2-7/7. Suppose now that a sequence of k log n contractions does not bring the 
graph down to two nodes. We have then that in k log n independent trials, an event that happens 
with probability at most 2-3/2 occurred (k — c) log n times. That is, taking € = W2=) 2/2; 
we have that we are exceeding the expected the number of non-contractions by a factor of 1 + e. 
Applying Chernoff bounds, we get that the probability of this happening is at most 

—e? klog n 
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For k sufficiently larger than c (but still constant), € is constant, so we get that the probability is 
nl) Since the number of paths only grows as 2, a union bound tells us that the expected 
depth is O(log n), which implies a polynomial running time. 


It remains to show the success probability of the new algorithm, which we need to determine 
how many iterations to run. Unfortunately, we cannot afford to be sloppy here, as whatever 
constant we compute will be hard-coded into an implementation. Since the algorithm is Monte 
Carlo, we have no other way to ensure that an implementation succeeds with an appropriate 
probability. Therefore, we devote the rest of this section to a careful analysis of the new algorithm. 
Readers not interested in the details of the analysis can safely skip it. 


We begin by reviewing the success probability analysis of the old algorithm. The main idea 
is to consider the tree defined by the recursion. (A node represents a call to RecursiveContract, 
and its children are the recursive calls.) We can use the tree to write a recurrence for the success 
probability. In particular, if we define p(k) to be the probability that a node at height k succeeds 
in finding the minimum cut of the graph it is given, then p(0) = 1 and 


2 
pk) > 1-1 spk) = p(k) - PREUE 
This recurrence follows from the algorithm; we succeed if in either trial the minimum cut survives 
the contractions and the recursive call is successful. Since we already argued that the minimum 
cut survives the contractions with probability at least 1/2, and the probability of a recursive call 
succeeding from a height k node is p(k — 1), we get the desired recurrence. The base case comes 
from the fact that the algorithm uses a deterministic strategy when the graph is small enough. 
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To solve this recurrence, we use a change of variables. Let q(k) = 4/p(k) — 1. This substitution 
gives the recurrence 


q(k) = q(k—1)+1+1/q(k—1) 


It is easy to verify by induction that 
k+3< qk) <k+Hy24+3/2 


Assembling, we get that 
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k) > —————~ 
Vie ona 
Now, since we succeed in a height k tree if and only if the minimum cut survives at some depth k 
node, and Hy, < 1 + In(k), we get the following lemma: 


Lemma 2.2.5 The probability that the minimum cut survives at some recursion node of depth d in the 


4 
RCA is at least say apaTs7F- 


As the depth of the recursion is roughly (7.e., up to small additive factors) log 3 n = 2logn, it 
follows that the success probability is O.(1/ log 1). 


We now consider the new algorithm. As we already argued, the new algorithm also preserves 
the minimum cut with probability at least 1/2 before recursing, so Lemma 2.2.5 holds. Unfor- 
tunately, we no longer know the depth of the recursion. Whereas in the old algorithm we were 
guaranteed a reduction in the number of nodes from n to [1 + n/V2] in each recursive call, in the 
new algorithm we only expect to reduce the number of nodes by a factor of 3/4. We will therefore 
have to do some additional work to determine the depth at which the minimum cut is actually 
found. We begin by assuming that the upper boundX has been set to A. At the end of this section, 
we justify our assumption. 


Our analysis is based on a network reliability analysis from [37]. That paper considers a graph 
in which each edge fails with probability p, and determines the probability that the graph remains 
connected. This problem is related to our objective as follows. Our goal is to show that at a 
certain recursion depth, the recursion has been terminated, which means that our graph has been 
contracted to a single node. That is, we want the set of contracted edges to span (connect) all of 
G. Inverting this objective, we can consider deleting the set of edges that were not contracted, and 
require that deleting these edges not disconnect the graph. 


Now consider a particular recursion node at depth d. The graph at this node is the outcome 
of a series of independent “contraction phases” in which each edge is contracted with probability 
1 — 2-1/4 (by our assumption that X = A). That is, the probability of not being contracted is 2-1”. 
It follows that at depth d, the probability that any edge is not contracted is 2-/*. We now invert 
our perspective as in the previous paragraph. We ask whether deleting the uncontracted edges 
leaves us with a single component. In other words: we consider deleting every edge of G with 
probability 2~¢/*, and ask whether the remaining (contracted) edges connect G. 


The following is proven in [45] (see also [37]), using the fact that among all graph with mini- 
mum cut A, the graph most likely to become disconnected under random edge failures is a cycle: 
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Lemma 2.2.6 Let G have n edges and minimum cut Xd. Then the probability that G is not connected after 
edge failures with probability 1 —p is at most n7p°. 


We might hope to apply this lemma as follows. 


Corollary 2.2.7 At a node at recursion depth klogn, for k > 2, the probability that G has not been 
contracted to a single node is at most n*-*. 


Proof. At depth klogn, the (cumulative) probability of non-contraction for a given edge is p = 
2-(klogn)/A — n-k/A_ Plugging into Lemma 2.2.6, we find that the probability we have not con- 
tracted to a single node is at most n7*. = 


Unfortunately, this lemma is not sufficient to prove what we want. At depth 3logn in the 
recursion tree, there are n* recursion nodes. Although each one has only a 1/n chance of not 
being a leaf of the recursion tree, there is a reasonable chance that not all are leaves. We must 


therefore perform a more careful analysis. 


We evaluate the probability of success as the product of two quantities: the probability that the 
minimum cut survives contraction to the given depth and the probability that the minimum cut is 
found by our algorithm given that it survives. Conditioning on the survival of the minimum cut 
makes our analysis somewhat complicated. 


Given the conditioning event, there is some node N at depth k log n in which the minimum cut 
has survived. We would like to claim that at this point the contracted graph has only two nodes. 
Unfortunately, conditioning on the survival of the minimum cut means that no minimum cut edge 
has been contracted, a condition that breaks our reliability model. 


To deal with this problem, we rely on the fact that our new algorithm examines the degrees of 
the nodes in its inputs. It therefore suffices to show that at least one side of the minimum cut is 
contracted to a single node, since this single node will be examined by the algorithm. We will in 
fact argue that both sides will be contracted to a single node. Another way to say this is that the 
edge failures break G into exactly two connected components. 


Lemma 2.2.8 Conditioned on the fact that a minimum cut has failed, the cycle is the most likely graph to 
partition into more than two pieces under random edge failures. 


Proof. A straightforward modification of [45]. | 


Corollary 2.2.9 Conditioned on the fact that a minimum cut has failed, the probability a graph partitions 
into 3 or more pieces is at most np”. 


The following lemma is an immediate corollary. 


Lemma 2.2.10 Conditioned on that fact that the minimum cut survives at some node of depth k log n, the 
recursive contraction algorithm finds the minimum cut with probability at least 


(knee TSnlaV4, 
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Proof. Consider the depth klogn recursion node at which the minimum cut survives. At this 
depth, the probability of edge “failure” is n-*/*. From the previous lemma, f(k, n) is the probabil- 
ity that the contracted edges at this recursion node reduce the graph to two nodes, implying we 
find the minimum cut. = 


Lemma 2.2.11 For any k, at depth k log n, the new RCA finds the minimum cut with probability at least 


ee re n) 
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Proof. From Lemma 2.2.5 we find that the probability that the minimum cut survives in some 
recursion node at depth d = klogn is at least EETICESAEEIPE We now condition on the event 
having taken place and apply Lemma 2.2.10 to find the probability of success f(k,n) given this 
event. The overall probability is the product of these two quantities. = 


Corollary 2.2.12 The probability that a single iteration finds the minimum cut is at least 


4 ] 
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Proof. Set k = 2+ ate in Lemma 2.2.11 | 


Note it is easy to evaluate the quantity of Lemma 2.2.11 on-line, so it is not necessary to analyt- 
ically determine the optimal k. We give Corollary 2.2.12 just to show that the success probability 
is again roughly 0(1/logn). 


From this analysis of the success probability of a single iteration, it is easy to compute the num- 
ber of iterations needed to achieve a specified success probability. In particular, if we want success 
probability p, and we denote the maximum value (over k) of the probability in Lemma 2.2.11 as s, 
then i, the number of iterations we need is given by 


(a kere 


Finally we consider what happens ifA is more than A. In this case, the probability of contracting 
a minimum cut edge at any recursion node is strictly less than one half. The result is that the 
probability of minimum cut survival at depth k quickly converges to a constant, instead of falling 
off linearly with k. (Note that we would get a similarly dramatic change if we made the probability 
strictly greater than one half: the probability would fall off exponentially with k.) In particular, 
ford >A, 


Pr[success] > 2'*% — 274 
Further, revising Lemma 2.2.10, we see that f(k,n) = 1 — non, But since the probability of 
the minimum cut surviving at a given depth is at least a constant, we can consider going to an 
arbitrary depth, in which case this quantity becomes 1. Note that we could attempt to deliberately 
use A > A, which would raise our success probability, but we would have to very careful, as we 
could easily cause the algorithm to run forever. This idea should be studied further. 
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2.3 Tree Packings 


Recall that in the first section we approach the minimum cut problem by exploiting the nice duality 
between s-t minimum cuts and maximum flow. Given that this duality exists, it is natural to ask 
whether the minimum cut problem has its own dual, which we could exploit directly. It turns out 
that there is such a dual, and in this section we discuss algorithms that use it. 


An a-arborescence is a directed spanning tree rooted at a; that is, a directed acyclic subgraph 
where a has in-degree zero and every other node has in-degree one. An a-cut is a cut; its value 
is the total capacity of edges crossing the partition from the side that includes a to the other. Two 
theorems of Edmonds relate a-cuts and a-arborescences: 


Theorem 2.3.1 [20] In a directed graph the maximum number of edge-disjoint a-arborescences equals the 
minimum value of an a-cut. 


Theorem 2.3.2 [19] The edges of a directed graph can be partitioned into k a-arborescences if and only if 
they can be partitioned into k spanning trees where every vertex except a has in-degree k. 


We refer to a set of edge-disjoint trees as a tree packing. It follows that if we take our undirected 
graph and transform it into a directed graph as we did for maximum flows, then for an arbitrary 
a, the maximum cardinality of a tree packing where every node except a has the same in-degree 
is equal to the value of the minimum cut. 


If we try to consider undirected spanning trees, we get a theorem that is close, but has some 
slack in it. In particular, Nash-Williams shows: 


Theorem 2.3.3 [49] An undirected graph with minimum cut X contains at least |A/2| edge-disjoint span- 
ning trees. 


Note that NI packs spanning trees (and forests), but it does so with different intent, as it does 
not attempt to find a maximum packing, but rather a maximal one. NI also packs undirected 
spanning trees, which means that in general it cannot hope to find more than A/2 full trees. 


In this section we describe algorithms that use tree packings to find the minimum cut. First 
we review an algorithm of Gabow that runs in time proportional to the value of the minimum cut. 
We then give a strongly polynomial algorithm due to Karger. It is not necessary to understand 
Gabow’s algorithm to understand Karger’s, although it is used in the implementation. Some 
readers may wish to skip directly to the section on Karger’s algorithm. 


2.3.1 Gabow’s Algorithm 


Edmonds’ theorems about the relation between arborescences and minimum cuts are analogous to 
the max-flow—min-cut theorem. It is therefore natural to look for an “augmenting trees” algorithm 
to find a tree packing, analogous to the classical Ford-Fulkerson augmenting paths algorithm to 
find a maximum flow [22]. This is precisely what Gabow [23] gives. 
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The augmenting paths algorithm for maximum flow works by repeatedly finding a path from 
the source to the sink in the residual graph and sending as much flow along it as possible. This 
does not mean that it is possible to greedily choose flow paths; a new augmenting path may elim- 
inate the flow on some edge of a previous path, but this just amounts to the two paths exchanging 
segments. (If one path goes from s to v to w to t, and another goes from s to w to v to t, this 
amounts to one path going from s to v to t and the other going from s to w to t.) 


Likewise, the basic idea of Gabow’s algorithm is to repeatedly find a tree and delete it from 
the graph until it is impossible to find any more. Again, this is not to say that a greedy strategy 
works—on the contrary, in its attempt to find a new tree, the algorithm must consider changing 
the trees already found. 


More precisely, given a set of trees, we try to build a new tree by starting with an empty forest 
(just vertices) and connecting trees of the forest until we have a new tree. To avoid confusion, we 
refer to a tree of the forest as an f-tree. We always hope to be able to find an edge that connects two 
f-trees, but it is possible that we cannot do so with unused edges. Nevertheless, we may be able to 
“trade” edges with an existing tree, taking an edge we need to link up two f-trees and giving an 
unused edge so that the tree stays connected. This process can of course be more complicated; we 
may need to move edges around many of the existing trees in order to get an edge we want. 


Remember that there is also another restriction: we must ensure that if the maximum packing 
has k trees, then every node except a will have in-degree k. Notice that this restriction is over the 
entire packing. That is, we do not require that each node other than a have in-degree one in each 
tree, merely that the sum of a node’s in-degrees over the trees is k. To handle this restriction, we 
maintain the invariant that each f-tree has precisely one node that does not have enough incoming 
edges (except for the tree that contains a, since a never needs incoming edges). We call such a node 
the root of its f-tree. (We always call a a root.) When we look for an edge that connects two f-trees, 
we actually want an edge that is directed to the root of one of them (never a). Then when we 
merge the two f-trees there is again precisely one node that needs incoming edges, and when we 
succeed in building a whole tree, a is necessarily the root, which implies that we have satisfied the 
degree constraints. 


Given this framework, in order to make progress we need to find a way to both increase the 
in-degree of an f-tree root and connect that f-tree to another f-tree. Consider an f-tree root v (not 
a). The easy case is if v has an incoming edge from a vertex in another f-tree. Then we are all set. 
However, it is possible that all of v’s incoming edges are from elsewhere in the same f-tree. In this 
case we need to try trading edges with some tree T. We give T one of v’s incoming edges, e, so 
that v’s in-degree will increase by one; T gives e’, one of the edges on the cycle formed by adding 
e, back to the set of unused edges so that it will stay a tree. This exchange effectively changes the 
root of the f-tree to v’, the head of e’, because that is the node that now lacks an incoming edge. 
We hope that now an incoming edge of v’ connects to another f-tree, but of course we may need to 
try another trade. This process terminates when we either succeed in connecting to another f-tree 
or decide that none of the possibilities work out. 


Note that we are looking for a sequence of edge trades; we refer to this sequence as an aug- 
menting path. (This augmenting path is in some sense analogous to the augmenting paths of the 
Ford-Fulkerson maximum flow algorithm, but they are not the same.) We now give pseudocode 
for the high level work of Gabow’s algorithm: 
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PackTrees(G, a) 
repeat: 
initialize a new forest of f-trees, with each vertex in its own f-tree 
while there is more than one f-tree 
mark all f-trees active, except the one containing a 


while there are active f-trees 
pick an active f-tree and search for an augmenting path 
if the search fails, return the set of trees found so far 
else do the sequence of trades, and mark inactive both of the f-trees involved 
add the one f-tree to the set of trees found so far 


The real trick of Gabow’s algorithm is to search for an augmenting path in an efficient manner. 
Intuitively, it should be possible to consider every edge at most once, so it should take no more 
than O(m) time, but that means it could take O(nm) time per tree, and O(Anm) time total. With 
such a bound we would not gain much by considering tree packings instead of maximum flows. 
However, Gabow shows how to find many augmenting paths in O(m) time, such that the time 
to add a tree is only O(mlogn). The remainder of this section describes how to find augmenting 
paths efficiently; readers not interested in such details may wish to skip on to Karger’s algorithm. 


The searches are done in a breadth-first manner. We keep track of the candidate augmenting 
paths by labeling each edge with the previous edge in the path. So when we start considering an 
f-tree root, we give all of its incoming edges a null label and add each to a queue. As a general 
step, we take the first edge from the queue and consider adding it to another tree. Adding an edge 
to a tree creates a cycle, so we label each edge of the cycle with the current edge and add it to the 
queue (we make sure that we only label each edge once per “round”, so there is no question of 
relabeling edges). We also label unused incoming edges of nodes on the cycle with the cycle edge 
incoming to that node. If we ever find an edge that connects two f-trees we halt, and trace back 
along the labels to figure out how to update. 


The efficiency of the above depends on several things. First, when we take an edge from the 
front of the queue, we do not try adding it to all the other trees. Instead we cycle through the 
trees, moving on to tree i when we first find an edge from tree i — 1 on the front of the queue. 
Second, we process the f-trees in rounds, where each edge can be labeled at most once in a round. 
In particular, we start a round by marking each f-tree (except the one containing a) as active. We 
then pick an active f-tree, search for a way to connect it to another f-tree, make the connection and 
mark both trees inactive. The point is to avoid looking at an edge too many times. Efficiency here 
also depends on the fact that the labeling is ordered so that the labeled edges in any tree form a 
subtree, making it easy to look at only the unlabeled edges in a cycle. (Given that the labeled edges 
form a subtree, we can keep track of its root. Now given the endpoints of an edge, we figure out 
which endpoint is labeled and jump to the root of the labeled subtree. We now either find that the 
unlabeled vertex has a labeled ancestor, in which case we label down to the unlabeled vertex from 
there, or we label from the root of the labeled subtree up to the least common ancestor of the two 
vertices, and down to the unlabeled vertex. In any case we preserve the property that the labeled 
nodes form a subtree, and it is easy to identify the new root of the labeled subtree if it changes.) 
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Search(v) 
initialize an empty queue, Q 
label all of v’s incoming edges with a special symbol start, and add them to Q 
set i to 0 
while Q is not empty 
take the first edge, e = {w, x}, off of Q 
if e is in tree i, set i toi+ 1 (modulo the number of trees) 
if both w and x are labeled, continue (the while loop) 
let u be whichever of w and x is unlabeled (one must be) 
let r be the root of the labeled subtree 
repeat: 
ifu=r 


for each edge f on the path from the root of the labeled subtree up to u 
and then down to whichever of w and x is unlabeled 
Label(f, e) 
if the call to Label() returns an edge, return it 


break (the repeat loop) 
if u and v are in different subtrees return e 
let y be the deeper of u, r 
if y’s parent edge is labeled 
for each edge f on the path from u down to whichever of w and x is unlabeled 
Label(f, e) 
if the call to Label() returns an edge, return it 
break (the repeat loop) 
set whichever of u and r is y to y’s parent 


Label(e, Ll) 
label e with L 
if e’s head has no labeled edges 
for each unused incoming edge of e’s head, f 
if f connects two f-trees, return f 
label f with e 
return NIL 


Assuming that this method is all correct, which is not obvious, but is proved by Gabow, it is 
not difficult to see the time bound. Each round looks at each edge at most once, and reduces the 
number of f-trees by at least a factor of two. Thus it takes at most O(mlogn) time to find each 
tree, totaling O(Am log n) time. 


2.3.2 Karger’s Algorithm 


The major problem with Gabow’s algorithm is that it takes time proportional to the value of the 
minimum cut, which could be huge for a graph with integer edge capacities, and it does not work 
at all for a graph with irrational edge capacities. It is possible to give a strongly polynomial tree 
packing algorithm [24, 5], but the time bounds are not better than O{nm). Karger finesses the 
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problem by showing that we can get by with a tree packing in a subgraph that does have a small 
minimum cut. 


One key observation is that a maximum tree packing must have at least one tree that uses at 
most two tree edges. This observation holds if we obtain a packing from a directed version of the 
graph, as in Edmonds’ theorem, or an undirected packing, as in Nash-Williams’ theorem. In the 
first case, each of the A cut edges gets turned into only two directed edges, and we get A trees, so 
it is not possible for all trees to contain at least three cut edges. Likewise, in the second case, we 
get at least \/2 trees, so it is not possible to give at least three of the A cut edges to each tree. 


If we could find a subgraph in which the cut values were small, but corresponded to the cut 
values of the original graph, it would still be the case that some tree in a packing in the subgraph 
would use only two minimum cut edges. Even better, the argument still goes through if we only 
have an a subgraph in which the cut values roughly correspond to the cut values of the original 
graph. Given such a subgraph, we can pack trees in it and find the minimum cut by checking for 
cuts that only use two tree edges. 

We can also motivate this approach in reverse. We know that by using Gabow’s algorithm 
we can find tree packings quickly in graphs with small minimum cuts. Thus it would be great if 
we could compute a tree packing in a subgraph with small minimum cut and use it to find the 
minimum cut of the whole graph. 


So at a very high level, Karger’s algorithm is as follows: 
K(G) 


find a subgraph G’ in which cuts correspond reasonably well to the cuts in G 
pack trees in G’ 


check some trees for cuts that use at most one tree edge 
check some trees for cuts that use at most two tree edges 


To refer to the last two steps conveniently, we say a cut k-respects a tree if it uses k tree edges. 
Similarly, we refer to a tree that contains only k edges of a cut as k-constraining the cut. Thus we 
are interested in 1 and 2-respecting cuts of a tree that 1 or 2-constrains the minimum. 


We now describe how to accomplish each step. 


Finding aSparseSubgraph The subgraph can be found by taking a random sample of the edges. 
The following theorem of Karger captures the key property of a random sample: 


Theorem 2.3.4 [36] Consider edges with capacity greater than one to be multiple edges with capacity one. 


If we sample each edge independently with probability p = bn, then with probability at least 1— O(1/n) 


all cut values in the sampled graph are within | + e of their expected value. 


For implementation purposes, it will be necessary to get the best constants factor we can from 
this theorem. We defer that chore to the implementation section, and just discuss application 
of this theorem here. The following argument holds with probability at least 1 — O(1/n). The 
minimum cut of the sample is at least (1 — e}pA, because all cuts have expected value at least 
pa. The edges of the original minimum cut sample down to at most (1 + €)pA edges. If we pack 
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directed spanning trees, then each minimum cut edge can be used only twice, and the number of 
trees is equal to the minimum cut of the sample. Hence, it is impossible to have all trees use at 
least three minimum cut edges as long as 2(1 + e)pA < 3(1 — e}pA. So if we take € > 1 a tree 
packing in the sample will have at least one tree that uses only two minimum cut edges. Note 
that the sampled graph has minimum cut O(log n), so Gabow’s algorithm will run quickly on it. 
By computing a sparse connectivity certificate, we can ensure that the sample only has O(n log n) 
edges, so in fact we can guarantee that Gabow’s algorithm will run in O(n log’ n) time. 


We note that even when we do our best to get small constants, they still are not very small, 
and this problem becomes a sticking point in the implementation. We ended up running tests on 
an implementation that violated this analysis and was consistently correct, but we have not yet 
been able to tighten the analysis to justify this action. See Section 3.6.1 for further discussion. 


Finding Tree Packings There are at least two possibilities for finding tree packings in the sub- 
graph. One is to make our undirected graph directed and use Gabow’s algorithm, which is de- 
scribed above. 


Another possibility is to approximate a packing of undirected spanning trees with the frac- 
tional packing algorithm of Plotkin, Shmoys, and Tardos [52] (PST). This approach hast the theo- 
retical advantage that in some cases we may find more than 4/2 trees, and in this case there will 
be a tree that uses only one minimum cut edge. We did not implement this method, so we do not 
describe it further. 


Finding 1-respecting Cuts We now need a way to find the minimum cuts that use only two tree 
edges. We start with minimum cuts that use only one tree edge. 


Define v! to be the descendents of v in a tree. (Assume the tree is rooted. We can root our trees 
arbitrarily if necessary.) For arbitrary f, define fi(v) = > wevt f(w). Given values f(v) it is easy to 
compute f!(v) for all v in linear time with a depth first (postorder) traversal of the tree. 


Figure 2.5: Cut defined by v. The cut edges are drawn solid. c!(v) counts the cut edges, as 
well as double counting the dotted edges (which are counted once by ¢(v)), so the cut value is 
c/(v) — 2p!(v). 
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We notice now that v! defines a cut that uses only one tree edge (v’s parent edge). So we 
might hope to compute cut values of 1-respecting cuts by computing ¢(v). This quantity is not 
quite correct though. c!(v) counts the total capacity of edges leaving the subtree rooted at v, but 
double counts the capacity of edges that connect two nodes in v! (see Figure 2.5). The quantity we 
want, the cut value if we cut v’s parent edge, is the total capacity of edges that leave the subtree 
rooted at v. To fix this problem, we let e(v) denote the total capacity of edges whose endpoints’ 
least common ancestor is v. All m least common ancestors can be found in O({m) time, so we can 
compute e(v) in O(m) time. Observe that p!(v) is precisely the total capacity of edges that connect 
two nodes in v'. Thus C(v') = c!(v) —24(v). It follows that we can find all cuts that use only one 
tree edge in O(m) time. 


Finding 2-respecting Cuts (the simple way) We can extend this approach to cuts that use two 
tree edges. A pair of nodes, v and w, now define a cut. We say that v and w are comparable if 
one is an ancestor of the other, and incomparable otherwise. If v and w are incomparable, we are 
interested in the quantity Civ Uw!) = Civ) + C(w!)—2C(v!, w+) (see Figure 2.6); if (without loss 
of generality) v € w!, we are interested in the quantity C(w!—v!) = C(w!)—C(v!)+2C(v!, wl—v!) 
(see Figure 2.7). 


Figure 2.6: Cut defined by incomparable v and w. The cut edges are drawn solid. C(v}) + C(w!) 
counts the cut edges, as well as double counting the dotted edges, namely those counted by 
C(v!, w!). So the cut value is C(v!) + C(w!) — 2C(v!, w!) 


We already know how to compute C(v!) and C{w), so we need only worry about the other 
terms. Define f,(w) = c(v,w)}, the capacity of the edge, if any, between v and w. Define gy(v) = 
fb(w). As argued above, we can compute this function for any v in O(n) time. So we can get all 
O(n2) values for pairs v, w in O(n2) time. Now gi,(v) = C(v!, w!), the desired quantity in the first 
case, and we can also compute it in O (n2) time. In the second case, the sum gt (v) double counts 
edges with both endpoints in v!, so to get the desired quantity we just take gh, (v) — 2p4(v). 


We now have an algorithm to compute all cuts that use at most two tree edges in O(n’) time. 
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Figure 2.7: Cut defined by comparable v and w. The cut edges are drawn solid. C(w!) counts the 
thin solid edges and the dotted edges. C(v!) counts the thick solid edges and the dotted edges. 
C(v!, w! — v!) counts the thick solid edges. Thus the cut value, the thick solid edges plus the thin 
solid edges, is given by C(w!) — C(v!) + 2C(vt, wt — vt). 


Assembling everything, we have a minimum cut algorithm that with probability at least 1 — 1/n 
finds all minimum cuts and runs in O(r¢ log n) time. 


A Faster Way to Find 2-respecting Cuts (the fancy way) The method given above computes 
the cuts for all O(n2) pairs of tree edges, and therefore takes Q(n?) time, regardless of the in- 
put. If all we want is one minimum cut, we would hope to avoid considering some pairs of tree 
edges. It turns out that we can. Some readers may wish to skip these details and move on to the 
implementation chapter. 


We start by fixing some vertex v. We now want to find the w such that the cut defined by 
cutting v’s parent edge and w’s parent edge is minimum. The key observation is that if v and 
this w are incomparable, then w must be an ancestor of a neighbor of a descendant of v, because 
we can demand that each side of the minimum cut be connected. (In order for v} U w! to be 
connected, there must be an edge between a descendant of v and a descendant of w.) If v and w 
are comparable, then without loss of generality we can assume w is v’s ancestor. This observation 
suggests a way to restrict the sums given above. We need only check vertices that fit one of the 
two conditions. 


It is not immediately clear that we will gain anything, because v (or a neighbor of a descendant 
of v) may have O(n) ancestors. So to be efficient, we use dynamic trees [54, 55], which among other 
things, support the following two operations on a tree with values at the nodes (val(w) at node 
w): 


AddPath(v, x) add x to the value of every node on the path from v to the root. 


MinPath(v) find the node on the path from v to the root with the minimum value. 
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Both operations can be done in O(log n) time. 


Thus for a leaf v, we can find its incomparable “partner” w by initializing all nodes u to have 
value C(ut), calling AddPath(u, —2c(v, w)) for each neighbor u of v, and then calling MinPath(w) 
for each neighbor u of v and taking the minimum w returned. This method works because the 
AddPath calls result in val(w) being C(w!) — 2C(v, w!). So val(w) + C(v') = C(v Uw!) (recall 
that v is a leaf, so v' = v). We want to minimize this quantity over w, for which we need only look 
at val(w), and this is precisely what the calls to MinPath do. 


LeaflncomparablePartner(v) 
for all u, val(u) — C(ut) 


for all edges {v, uj, AddPath(u, —2c(v, u)) 
return the w corresponding to the minimum value over edges {v, u} of MinPath(u) 


We can find v’s comparable partner by using the same initialization, calling AddPath(u, 2c(v, w)) 
for each neighbor u of v, and just checking MinPath(v). This time the calls to AddPath result in 
val(w) = C(w!) + 2C(w! —v,v). So we get the desired quantity by subtracting C(v!), and we can 
find the minimum over comparable w by the call to MinPath. Note that there is only one call to 
MinPath, because in this case we are only interested in ancestors of v. 


LeafComparablePartner(v) 
for all u, val(u) © C(ue) 


for all edges {v, uj, AddPath(u, 2c(v, w)) 
return the w corresponding to MinPath(v) 


Note that we cannot afford to actually initialize the dynamic trees for each v we process. We 
have to initialize them once and then undo dynamic tree operations when we are done processing 
a given leaf. At least in theory, this is not a problem. 


To process an internal node, we need to look at all the neighbors of its descendants. The basic 
idea is to process all the leaves of the tree and contract them into their parents. We can then process 
the leaves of the new tree. If the tree is balanced, O(log n} such phases suffice to process the whole 
tree. Unfortunately, it is possible to have n nodes with downward degree one, in which case we 
do O(n) phases. 


However, after we process the only child of a node v, we can immediately process the node 
by doing AddPath operations for each of its neighbors and then MinPath operation(s). We already 
have the values set for the neighbors of the descendants, so the AddPath operations will update the 
values appropriately. The partner of v is then either the best partner found for the descendant, or 
an ancestor of a neighbor of v. Note that we cannot immediately process a node with downward 
degree two, because after processing one child we must undo those AddPath operations before 
we can process the other child. We then have to redo them when we process the node itself. It 
is because no undoing is necessary for a node with downward degree one that we can go right 
ahead and process it. 


It follows that we can process in one phase all nodes that have downward degree less than two 
and do not have as descendants any nodes with downward degree more than one. After a phase, 
each new leaf node is a node that had at least two children previously (or else it would have been 
processed), so the number of leaves decreases by a least a factor of two in each phase. It follows 
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that we need only O(log n) phases. Each phase does at most O(1) dynamic tree operations per 
edge, and each dynamic tree operation takes O(log n) time, so the time to process the whole tree 
is O(m log? nN). 


40 


CHAPTER 2. BACKGROUND 


Chapter 3 


Implementation 


Unfortunately, knowing how an algorithm works in theory is not the same as knowing how to 
implement it well. In this chapter we discuss the important implementation details of the four al- 
gorithms, including choice of data structures, implementation of primitives, choice of subroutines, 
settings of parameters, and heuristic improvements. 


In general, we only concern ourselves with implementation details that have major impact. It 
is almost always possible to speed up a program a bit by cleverly twiddling the code, but those 
are not the sort of changes that concern us. The rule of thumb we adopted was not to worry 
much about any details that changed the runtime by less than a factor of two. Some heuristic 
improvements affected the runtime by factors of a thousand; these are clearly of much greater 
interest. 


Another rule we use in the implementations is that a heuristic be amortized against the run- 
ning of the underlying algorithm. In other words, a heuristic run periodically should not cost more 
than the work done by the algorithm in the interim. Further, any heuristic preprocessing should 
not take more than near linear time, as linear work is all that we can be assured the algorithm will 
need to do. The point is that while we wish to get as much benefit as we can from heuristics, we do 
not want it to ever be the case that an implementation suffers terribly on some problems because 
of its use of heuristics. For example, we would not allow a preprocessing heuristic that ran in 
quadratic time, because even if the worst case running time of the underlying algorithm is cubic, 
it is always possible that it will run in linear time. If this happens, then we would lose a great deal 
by running the heuristic. On the other hand, after the algorithm has done quadratic work, there 
is no harm in running the heuristic, because even if it fails we lose no more than a factor of two 
in total running time. Our strategy guarantees that failed heuristics never dominate the runtime, 
and that successful heuristics are not much more expensive than the underlying algorithm. 


Throughout this chapter we mix discussion of the abstract algorithms and our implementa- 
tions of them. To avoid confusion we distinguish them by typeface. Sans serif font refers to an 
algorithm: HO, NI, KS, K. Typewriter font refers to an implementation: ho, ni, ks, k. Since we 
have many variants of each implementation, we will distinguish them with suffixes. For example, 
ho_nopr refers to an implementation of HO that does not include PR heuristics. 


We begin by discussing graph data structures and implementation of the contraction opera- 
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tion. Next we discuss general issues in incorporation of the Padberg-Rinaldi heuristics. Then we 
discuss each of the algorithms in turn. 


3.1 Graph Data Structures 


The internal representation of the graph is very important. There are many possible choices, and 
designation of the “best” one is only possible with respect to the operations that must be sup- 
ported. 


HO and NI clearly need a representation that makes it easy to find the neighbors of a given ver- 
tex, in order to support the push/relabel operations and the graph search, respectively. Gabow’s 
algorithm also clearly needs an adjacency representation, so K needs one if it uses Gabow’s algo- 
rithm to do the tree packing. K will also need one if it uses the fancy (dynamic trees) approach 
to finding 2-respecting cuts. If K is implemented without either of these subroutines, as it can be, 
then it is less clear what data structure is right. For KS, especially in the variant we give, there is 
no apparent reason why it would need an adjacency structure. In fact, what we really want for 
KS is to be able to pass over all of the edges quickly so we can decide whether to contract each of 
them. 


So ho, k, and ni all represent an undirected graph as a symmetric directed graph using the 
adjacency list representation. Each vertex has a doubly linked list of edges adjacent to it and 
pointers to the beginning and the end of the list. An edge {u, v} is represented by two arcs, (u,v) 
and (v, uw). These arcs have pointers to each other. An arc (u,v) appears on the adjacency list of u 
and has a pointer to v. 


We note that another possibility would be an adjacency matrix; however, this representation 
would be very space inefficient for a sparse graph. The space could be reduced by hashing, but 
hashing removes the simplicity of an adjacency matrix that makes it attractive. We did not explore 
this possibility. 

ks represents an undirected graph as an array of edges. This representation has less flexibility 


than the adjacency list, but KS really does not need it, so a smaller and simpler representation 
makes sense. It is possible that it makes little difference. 


3.1.1 Implementing Contraction 


As mentioned in Section 2.2, contracting nodes v and w consists of merging v and w. This action 
will create parallel edges if v and w have any neighbors in common, and will create a self-loop if v 
and w are neighbors. Parallel edges can be merged into one edge with capacity equal to the sum of 
the original two capacities; self-loops can be deleted. So we have two important implementation 
issues: how to represent merged vertices, and what to do about unnecessary edges. 


One possibility is to do the contractions explicitly, so that a node is always represented the 
same way as a vertex, without self-loops or parallel edges. We refer to this strategy as compact 
contraction, and implement it as follows: 
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a b a 
NJ NK 
d Cc d Cc 
2 
2 
before after 


Figure 3.1: Compact contraction example: Contracting edge {a, b} in a graph represented by adja- 
cency lists. Graphs in the top row are represented as shown in the bottom row. If an arc capacity 
is equal to one, the capacity is not shown. 


CompactContract(v,w) 
Replace each edge {w, x} with an edge {v, x}. 


Delete w from V(G). 
Delete self-loops and merge parallel edges adjacent to v. 


Figure 3.1 gives an example of this implementation of edge contraction for the adjacency list 
representation of a graph. 


With the graph represented by an adjacency list, a careful implementation of compact contrac- 
tion of v and w takes time proportional to the sum of degrees of v and w before the contraction. 
With the graph represented by an array of edges, compact contraction can take O(m) time, and 
therefore is probably not practical. 


Another possibility is to implicitly represent nodes by the sets of vertices they contain. We do 
not need to actually merge v and w, we just need to be able to tell what node a vertex belongs to. 
This observation suggests using a disjoint-set union data structure to keep track of the nodes of 
the contracted graph. We refer to this strategy as set-union contraction and implement it as follows: 
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IN 
NWN 
NWN 


before after 


Figure 3.2: Set-union contraction example: Contracting edge {a, b}. Graphs are in the top row. An 
adjacency list representation is in the middle row, and an array of edges representation is in the 
bottom row. Dotted lines are pointers for the set-union data structure. 
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SetUnionContract(v,w) 
union(v,w) 


We implement the disjoint-set union data structure by disjoint-set forests with path compres- 
sion; see e.g. [14]. In this representation each set has a distinct representative, each vertex has a 
pointer towards (but not necessarily directly to) the representative of its set, and representatives 
point to themselves. We assume that the reader is familiar with the set-union data structure. 


The advantage of set-union contraction is that a contraction takes only O(«(n)) time, where 
a is a functional inverse of Ackermann’s function (bounded by 4 for n less than the number of 
particles in the universe). Disadvantages come from the parallel arcs and self-loops which remain 
in the graph. With parallel arcs, operations that we would have done in one step could take many 
steps. Worse, we are not guaranteed that the number of arcs in a contracted graph is O(1?). The 
other disadvantage is that it now also takes O(«(n)) time to find the head node of an arc. Note 
however, it is possible to use set-union contraction to do many individual contractions and then 
compact the graph in one pass. We call this operation compaction. This approach keeps the benefit 
of fast contractions and hopefully cleans up before the parallel arcs get out of hand. We implement 
compaction as follows: 


CompactGraph(G) 
compute the set representative for every edge endpoint 
if the graph is represented as an edge array 
sort the edges by endpoints 
pass over the sorted list of edges, combining parallel edges and removing self-loops 
else (graph is represented by edge lists) 
append the edge lists of each vertex to the list of its set representative 


for each set representative 
unmark any marked neighbors 
for each edge e 
if e is a self-loop, delete it 
else if e reaches an unmarked neighbor, mark the neighbor with e 
else merge e with the neighbor’s mark 
remove all vertices that are not set representatives 


Using two calls to a counting sort algorithm to implement the sort step, compaction can easily 
be done in O(m) time with either graph representation. 


We make use of all of these strategies at different times. In general we use set-union con- 
traction followed by a compaction when we have many contractions to do, and we use compact 
contraction when we have few contractions to do. We will discuss this issue further in subsequent 
sections. 


3.2 The Padberg-Rinaldi Heuristics 


It is not clear how to get the most benefit from the PR tests. One natural strategy, which is used 
by Padberg and Rinaldi [51], is to keep applying them until no edges pass any of the tests. We 
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refer to this approach as the exhaustive strategy. The problem is that this strategy takes too long. It 
could take O(mn) time just to apply tests PR3 and PR4 to every edge. Since edges change during 
contractions, and therefore must be tested again, if one PR test passes at a time, applying these 
two tests exhaustively could take O(n*m) time, dominating the running time of all the algorithms 
we are implementing. Even PR1 and PR2 could take O{mn) time to apply exhaustively. So our 
rule on the cost of heuristics removes the exhaustive strategy from consideration. 


An alternative strategy is to apply the tests only to edges that change. Recall that a contraction 
can create parallel edges, which may be merged. The resulting edge has larger capacity, soa PR 
test may now apply, even though it did not apply to either edge individually. We refer to this 
approach as the source strategy, because in a contraction based implementation of GH (or HO), 
which contracts the source and sink at the end of a max-flow computation, edges incident to the 
source are the ones that change. This strategy also extends to NI: apply the PR tests near the edge 
contracted after the last search. Both Padberg and Rinaldi [51] and Nagamochi et al [48] make use 
of this approach. Note that it could be similarly extended to KS as it was originally described, 
where we contract one edge at a time, but it does not make sense for our variant, where we 
contract many edges at once. It also makes no sense for K, which does not fit into the framework 
of GenericContractCut. Note that this strategy does not replace the exhaustive strategy, because we 
apply the tests where the underlying algorithm has changed the graph, but then do not reapply 
according to where the tests change the graph. (If we did, then it would be the exhaustive strategy, 
and we would not have gained anything.) 


As an alternative to the above methods, we introduce a new approach, which we call the pass 
strategy. The basic idea is to apply the tests as much as possible in linear time. For PR1 and PR2, 
there is a natural way to implement this idea: apply each test once to each edge. For PR3 and 
PR4 we need to skip many edges. We approach this problem by making a new low-level test that 
applies PR3 and PR&4 to all edges incident to a node. Recall that PR3 looks at an edge and the 
other two sides of a triangle, and PR4 looks at all triangles an edge is in. We implement our test as 
follows: 


PRTest34(v) 
label all neighbors of v 
for each unscanned neighbor w of v 
sum = 0 
for each neighbor x of w that is labeled (i.e. is a neighbor of v) 


add min(c{v,x)},c(w,x)) to sum 
apply test PR3 to the triangle {v, w}, x 
apply PR4 to {v, w} using the info in sum 
mark w scanned 
mark v scanned 


The main point is that a neighbor’s incident edges are only looked at if it is unscanned; it is 
therefore immediate that the following implementation of a pass only takes linear time: 
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PRPass(G) 
apply PR1 and PR2 once to each edge 


mark all nodes unscanned 
while there is an unscanned node v, PRTest34(v) 


A linear time pass turned out to be very useful. For one thing, it makes sense to preprocess the 
input with it, because it only takes linear time. Then, if it contracted away a constant fraction of 
the nodes, we apply it again. We used this preprocessing strategy for all the algorithms, although 
we used a different constant for different algorithms. It turned out that preprocessing alone killed 
several otherwise interesting problem families—only two nodes would be left by the time we were 
done. A linear time pass also integrates nicely into NI and KS. We will discuss this issue further in 
subsequent sections. 


Observe that this strategy needs an adjacency structure to support PR3 and PR4, but it can be 
implemented with either contraction method. We can either do compact contractions or do set- 
union contractions and compact when we are done. The former has the advantage that parallel 
arcs are merged right away, which may cause a later test to pass when it would not have otherwise. 
The latter has the advantage that one compaction at the end takes only O(m) time, whereas many 
compact contractions could take O(n’) time. Our experience was that compact contraction was 
better for tests PR3 and PR4, because it caused more tests to pass. Set-union contraction was 
typically a bit faster than compact contraction for PR1 and PR2, but there was less difference in 
the number of passed tests. 


We note that a subtlety of the tests says that we must not involve any v in more than one PR2 
or PR3 contraction met with equality in one pass if we use set-union contraction. Recall that in 
this case {v} may be the minimum cut, and once we have done one set-union contraction involving 
v, the old c(v) will not be correct. An easy example where we would get into trouble is a line of 
five nodes, with edge capacities two, one, one, and two. PR2 applies to both of the edges with 
capacity one, but after contracting one, it does not apply to the other. If we do both contractions 
blindly, we will miss the minimum cut. 


Note that for the purposes of explanation we have left out some details from the above pseu- 
docode. In particular, one must be careful about how nodes are labeled in PRTest34 so there is 
no confusion between different calls, and obviously one must not apply a test to an edge that has 
already been contracted. 


We have also failed to specify in what order to apply PRTest34. Our experience was that over 
several passes it was good to try to give every node a chance to be scanned, and it was good to 
apply tests again where they had succeeded before. To achieve this, we assigned every node a 
score, initially zero. Each time it was involved in a contraction its score was incremented, and 
each time it was skipped for a PRTest34 its score was incremented by two. We then picked nodes 
in decreasing order by score. Using a random order worked reasonably well too. 


We warn anyone who implements these tests that they are very subtle. Seemingly little changes 
do often affect performance. For example, the seemingly innocent change of applying PRTest34 in 
an arbitrary fixed order causes us to lose many contractions. It took us a long time to arrive at the 
strategies described here. We recommend comparing any new strategies against the ones we use. 
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3.3 Hao-Orlin Algorithm 


We based our implementation ho on the push-relabel max-flow code of Cherkassky and Gold- 
berg [13]. Thus several implementation decisions were made based experience from the max-flow 
context. Such decisions are likely appropriate, but not above question. 


We begin by discussing choices we made in the implementation that are not addressed by the 
algorithm. We then discuss the heuristics we added. 


3.3.1 Basics 


Graph Representation As mentioned previously, we need an adjacency data structure for HO, so 
we use adjacency lists. Contractions tend not to happen in groups, so we use compact contraction 
everywhere in ho. 


Push-Relabel Strategy We chose to use the highest-label strategy to pick which nodes to dis- 
charge first. This strategy seems to give the best results in practice in the max-flow context [13]. 
For this reason we did not consider the fancy approach that uses dynamic trees. 


We use an array of buckets B[0...2n — 1] to implement this efficiently. Bucket B[i] holds both 
a list of all awake vertices with distance label i and a list of those that are also active. This makes 
it easy to keep track of the highest label active node, as well as making it easy to detect nodes that 
have become disconnected from the sink and must be put to sleep. 


Source and Sink Selection In some cases the algorithm is sensitive to the way the first source- 
sink pair is chosen. We chose a vertex with the largest capacity as the first sink and a neighbor of 
this vertex as the first source. Brief testing suggested that this choice works well in general. 


3.3.2 Heuristics 


Global Updates In the maximum flow context, it is useful for many problem classes to period- 
ically compute exact distances to the sink. This operation is known as a global update, and it can 
easily be accomplished by a backwards breadth-first search from the sink. (Backwards means that 
we traverse the directed edges in the wrong direction, which is easy, because we also have arcs in 
both directions.) We make several natural modifications in order to use this idea in HO. 


e The sink’s distance label is nonzero, so the backwards breadth-first search starts with the 
sink’s distance label instead of zero. 


e The computation is done only on the awake graph. (We do not want to disturb the sleeping 
nodes.) 


e Vertices not reachable by the breadth-first search computation are put to sleep without chang- 
ing their distance labels. (The sink is not reachable from these vertices.) 
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In order to ensure that we do not spend too much time on global updates, we explicitly amor- 
tize against relabels: a global update is performed as soon as the number of relabels since the 
last global update exceeds 8 times the number of awake vertices. Thus relabeling time always 
dominates time spent doing global updates. In our implementation, 2 = 2. 


Global updates do not always improve the running time; on some problem families, such as 
graphs with heavy componentsj, the running times become worse. However, the running times 
never become much worse in our tests, and sometimes are much better than without global up- 
dates. 


PR Heuristics As in all our algorithms, we preprocess with PR tests before doing anything else. 
We also make use of the PR tests during the execution of HO, which is very tricky, because HO 
reuses flow information for successive flow computations. We have to be careful not to disturb 
the flow information when we do contractions as a result of the PR tests. 


Fortunately, it is easy to show (using [31]) that it is safe to contract an edge incident to the 
source, as long as we saturate any new outgoing capacity from source that this operation creates. 
So we can safely use the source PR strategy. Using the source strategy is also appropriate, as it is 
the edges near the source that typically change. Unfortunately, the algorithm frequently does little 
work in a flow computation, and the source quickly becomes high degree, making even a source 
test expensive. Thus we explicitly amortize against the work of the algorithm: we apply a source 
test when the algorithm has done enough work in pushes and relabels to dominate the cost of the 
last test. 


Note that it is possible to do a PR pass, which might contract an edge not adjacent to the 
source, but we must do a global update immediately afterwards to restore the validity of the 
distance labeling. Our experience with this idea was that it usually just slowed down the code, so 
we stopped using it. 


Excess Detection We introduce a simple heuristic that often allows us to contract a vertex in the 
middle of a flow computation. The general results on the push-relabel method [26] imply that the 
excess at a vertex v is a lower bound on the capacity of the minimum s-v cut. Thus, if at some 
point during an s-t cut computation the excess at v becomes greater than or equal to the capacity 
of the minimum cut we have seen so far, we contract v into the source and saturate all the arcs 
going out of v. Note that v can be either awake or asleep. The correctness proof for this heuristic 
is straight-forward. We call this heuristic excess detection. 


A special case of the excess detection heuristic occurs when v is the sink. In this case we can 
stop the current s-t cut computation and go on to the next one. (Remember that we do not actually 
care about the s-t cut unless it is a smaller cut than we have seen before, and the fact that excess 
detection applies means that it is not, so we can contract s and t and move on.) 


Excess detection is inexpensive and on some problems, it reduces the number of s-t cut com- 
putations significantly. We note that one needs to be careful when implementing excess detection, 
because one contraction can cause excess detection to pass at other nodes. (Suppose the excess at 
v becomes large and v is contracted into the source. When v’s outgoing arcs are saturated, excess 
at some of v’s neighbors may become large, and these neighbors should be contracted as well.) 
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This problem can be handled by keeping track of the nodes waiting to be contracted in either a 
stack or a queue, as long as we make sure not to put a node on the stack/queue more than once. 
In general, excess detection requires care, because we are changing the graph during a maximum 
flow computation. 


Note that in fact the total flow entering v is a lower bound on the minimum s-v cut, and the 
total flow is always at least as large as the excess, so we could get a stronger test by using this 
quantity. However, we already need to maintain excesses, whereas we do not need to maintain 
total incoming flow, and brief testing suggested that the extra cost of maintaining this information 
negated the benefit from the contractions gained. 


Single-Node Layers Suppose a sleeping layer consists of a single node v. When this node is 
awakened it will be necessarily be the only awake node other than the source, so it will become the 
sink, and the flow value will be its excess plus any further flow it can get directly from the source. 
Further, we know that all nodes that are awake at the time v is put to sleep will be contracted into 
the source by the time it is awakened. Thus the “further flow it can get directly from the source”, is 
precisely the total capacity from the currently awake nodes. We therefore have all the information 
to compute the s-v flow at the time v is put to sleep, so instead of bothering to put it to sleep we 
compute the flow value and contract it into s immediately. This reordering can be helpful because 
it may cause PR tests and/or excess detection tests to pass earlier than they would have. 


3.4 Nagamochi-Ibaraki Algorithm 


NI required relatively little modification from its theory description. We just incorporated two 
heuristics given by Nagamochi et al. [48], made some careful data structure choices, and incorpo- 
rated PR tests. We differ from [48] on the latter two points. 


Nagamochi-Ono-Ibaraki Heuristics Nagamochi et al. [48] give a heuristic modification to NI 
that often helps to update the upper bound on the minimum cut. The heuristic takes advantage of 
the fact that the set of visited nodes is connected, and therefore defines a cut. It may seem that this 
is just an arbitrary cut, but recall that we always pick the most connected node to visit next. So for 
example, if we have two cliques connected by a single edge, we will likely visit all of the nodes of 
one clique before visiting any nodes of the other. Thus if we always check the cut defined by the 
set of visited nodes, we will find the minimum cut. Furthermore, since most of the edges in this 
graph are unnecessary, as soon as we find the minimum cut the sparse certificate will allow us to 
contract most of the edges. In general, this heuristic is very helpful at allowing us to get the most 
out of our sparse certificates. 


We can easily keep track of this cut value by adding c(v) and subtracting 2r(v) each time we 
visit a node v. (Recall that r(v} is the capacity of edges between v and the visited nodes, so the 
previous cut value contains r(v) once. Adding c(v), we count r(v) again and add in the capacity 
from v to unvisited nodes. So subtracting 2r(v) we get the desired quantity.) We use the value to 
update our cut upper bound. The code for this heuristic, called the « heuristic, appears at line (*) 
below. 
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We now give the pseudocode for a scan-first search that contracts the contractible edges it 
finds. Note that this is different from the description in the theory chapter in that we do not 
explicitly build the spanning forests, and we do not assume that edges are uncapacitated. 


ScanFirstSearchContract(G, A) 
for eachv € V 
r(iv) 0 
mark v unvisited 
for eache € E 
mark w unscanned 
while there is an unscanned node 
v is the unscanned node with largest r(v) 
(*) a+ at+cl(v) —2r(v) 
XK & minfa, A} 
for each unscanned e = {v, w} 
t(w) & r(w) + c(v, w) 
if (r(w) > X) 
G + G/(v, w) with new node v’ 
RK = min{e(v’), A} 
Mark e scanned 
Mark v visited 
return A 


Priority Queue The theory bound given by Nagamochi and Ibaraki depends on use of a priority 
queue with a constant time increase-key operation, e.g. a Fibonacci heap. Preliminary experiments 
suggested that Fibonacci heaps do help a little bit on very dense graphs, but otherwise it is better 
to use a (simpler) k-ary heap. In the end we chose to use a 4-ary heap. Note that this makes the 
theoretical worst case time bound on our implementation O(mn log n). 


PR Heuristics Nagamochi et al [48] incorporate the PR tests by applying a source test at the node 
created by the last contraction of a search. This approach turns out to be very helpful, but their 
implementation has disadvantages: there is no preprocessing stage, the tests are only applied to 
one node after each search, and they are not careful to make sure the tests only take linear time 
each time. 


So we add preprocessing, and we do a PRPass (Section 2.2.1) at the end of every kth search. 
(We used k = 2.) Since a pass takes only linear time, and a search takes slightly more than linear 
time, we respect our rule on the time consumed by heuristics. It also turns out that this strategy is 
very effective. We use the score method described in Section 3.2 to decide which nodes to test, but 
we clear the scores after each search, so that we always test first the nodes involved in the most 
contractions during the search. Note that we also perform the source test, as we do not necessarily 
test the result of the last contraction in a pass, and sometimes the source test is very helpful. 
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Graph Representation As mentioned previously, we need an adjacency structure, so that is what 
we use. Implementation of contraction is trickier. We implemented NI using both compact con- 
traction and set-union contraction. Preliminary experiments showed that the two data structures 
are incomparable in practice: each was significantly faster on some problems and significantly 
slower on others. 


The problem is that we are only guaranteed one contraction per search, in which case we 
would prefer to use compact contraction. It is possible however, that we do many contractions, 
in which case we would prefer to use set-union contraction. So we can use set-union contraction 
and compact periodically, but we also have the problem that the PR heuristics prefer compact 
contraction. So we adopt the strategy of using set-union contraction during the scan-first search, 
compacting at the end, and using compact contraction during PRPass. This approach never allows 
the parallel edges to get out of hand, allows the PR tests to use their preferred method, and is never 
too expensive—even if we do few contractions, the cost is usually less than that of the preceding 
search. 


Our final high level implementation is as follows: 


NI(G) 
Ke min, c(v) 
while G has more than two nodes 
XK & ScanFirstSearch(G, A) 
CompactGraph(G) 


if this is a kth (second) iteration 
do source PR tests at last node involved in a contraction 


PRPass(G) 
return A 


3.5 Karger-Stein Algorithm 


For KS, the main implementation decision we made was to implement our variant (Section 2.2.3) 
instead of the original version. Early testing suggested that our variant was better, but now that 
we understand the algorithm better it would be interesting to implement the original, to make 
sure that our variant was a good idea. Note that we violate our rule about heuristics here, because 
we have not been able to prove that the variant is as fast as the original in the worst case, so more 
extensive comparisons to the original should be done. 


We needed to use an exponential distribution to sample each edge with probability exponen- 
tial in its capacity. Though we were initially concerned by the resulting large number of calls to 
exponential/logarithm functions, we found that in practice the generation of these random num- 
bers was not a significant part of the running time. Beyond that we just needed to deal with data 
structures and the PR heuristics. 


Graph Representation As mentioned previously, KS does not really need adjacency lists, so we 
did not use them. We use set-union contraction to do all the contractions of a phase and then 
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compact the structure. With a careful implementation of this approach, it is relatively easy to 
undo the contractions when we back out of the recursion. (It would have been more difficult with 
adjacency lists; the best thing would probably be to just copy the graph.) 


PR Heuristics The decision not to use adjacency lists basically rules out easy use of PR3 and 
PR4. Since it is clearly desirable to have the same preprocessing as the other codes, we actually 
input the graph in ni’s data structures, ran the preprocessing, and then switched to the array of 
edges. 


We experimented with internal PR3 and PR4 tests, because we were concerned about not hav- 
ing them, but we found that they were of little help. A possible explanation for this effect is the 
following (we focus on the PR4 case; a similar arguments applies for the PR3 test). The PR4 test 
applies when the sum of capacities on length-two paths exceeds A, where the capacity of a length 
two path is the smaller of its edge’s capacities. Consider a randomized contraction phase and its 
impact on such a “PR4 structure”. The total capacity of edges in the PR4 structure is 2A, implying 
that the probability some edge in the structure is contracted exceeds 3/4. Especially over multiple 
levels of recursion, this accumulates much faster than the 1/2 chance that a minimum cut edge 
will be contracted. Once we contract an edge in the PR4 structure, the PR4 test will no longer 
apply. In other words, in an intuitive sense, randomized contraction is taking care of the PR4 and 
PR3 tests before we have time to apply them explicitly. 


It remains to describe our internal use of PR1 and PR2. Since we already pass over all the 
edges of the graph, contracting them with an appropriate probability, it is easy to incorporate PR1 
and PR2. We apply the tests when we make the random choice of whether to contract an edge; we 
then contract it if either a test says to or the random choice says to. This implementation clearly 
adds only a small overhead. 


As mentioned in the discussion of the tests, since we are using set-union contraction we also 
need to be careful not to let a node be involved in more that one PR2 test met with equality in one 
pass. 


3.6 Karger’s Algorithm 


Karger’s algorithm leaves open a large number of implementation options. We begin with the 
familiar topics of graph representation and the PR tests, and then consider each of the three parts 
of the algorithm in turn. One of our implementation decisions invalidates the algorithm’s proof 
of correctness; see the section on picking ¢ for more details. Thus this implementation is actually 
one great big heuristic. 


Graph Representation As mentioned before, it would be conceivable to implement K without 
an adjacency structure, but we did not attempt it. The basic graph representation is as in ho 
and ni. There is, however, another issue: we must represent the trees of the tree packing. Since 
one (capacitated) edge can occur as many tree edges, we used more adjacency list structures to 
represent the trees. Each tree edge maintains a pointer to the graph edge it derived from. Note 
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that we need this adjacency structure for Gabow’s algorithm; with the Plotkin, Shmoys, Tardos 
packing algorithm it is possible to represent the trees with a only a parent pointer for each vertex. 


PR Heuristics Naturally, we also used the PR preprocessing for k. Since the algorithm does 
not do any contractions, the only way the PR tests might do further good is if we get a better 
upper bound on the minimum cut. If we get anew upper bound early enough in the execution so 
that the further contractions might help, we run the preprocessing again. We discuss how we get 
new upper bounds in subsequent sections. Our general finding was that our initial upper bound 
estimates were good enough that there was not much to be gained by doing this. 


3.6.1 Sampling 


There are several problems with the simple theoretical description of the sampling step that need 
to be finessed in an implementation. 


Estimating the Minimum Cut In order to perform the sampling step correctly, the algorithm 
needs to estimate the value of the minimum cut. In [34], Karger gives two ways to resolve this 
problem. The first is to run Matula’s linear time (2 + €)-approximation algorithm to get a good 
estimate (see Section 2.2.2). The other option starts by getting a crude approximation and then 
samples the edges, finds a tree packing, and doubles the sampling probability, repeating if the 
number of trees in the packing is smaller than expected. Since doubling the sampling probability 
doubles the number of trees, finding the final tree packing dominates the time of finding all the 
others. If the crude approximation was within a factor of n, then the time spent sampling is at 
most O(mlogn). 


Our experience is that it is better to run Matula’s approximation algorithm. The main reason is 
that running Matula’s algorithm allows us to compute a sparse ((2 + €)A)-connectivity certificate 
on the input. We can then contract all remaining edges, which can greatly reduce or even solve 
some problems. Furthermore, once our input graph is sparse, our sampled graph will be sparse; 
in particular, it will have only O(nlog* n) edges. The tree packing step turns out to be expensive, 
so it is helpful to have as few edges as possible in the sampled graph. 


Sampling from Capacitated Graphs Another concern is sampling a capacitated edge. In theory, 
we treat a graph with integer capacities as an uncapacitated graph with multiple edges, but we 
do not want to actually flip a coin c(v,w) times for edge {v, w}, and we still do not know what 
to do with irrational capacities. For integer capacity edges, what we want is to pick a number 
from 0 to c(v,w) according to the binomial distribution. Notice that the sampling probability is 
inversely proportional to the cut value. So if we were to multiply all the edge capacities by some 
large factor, causing the minimum cut to go up by that factor, the probability would go down 
such that the mean of the distribution would stay the same. Therefore we can approximate the 
binomial distribution with the Poisson distribution, which is very close to the binomial for large 
numbers and small mean. Picking a number according to the Poisson distribution can be done 
such that the number of random numbers we need is the same as the value we output (see [42]); 
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since the expected value of a sampled edge is always at most O(log n), this method allows us to 
sample using O(mlogn) random numbers, regardless of the magnitude of the capacities. Since 
an irrational number can be approximated arbitrarily well by a rational, and we can multiply up 
a rational to get an integer, in the limit this method properly samples irrational capacity edges. 
Note that we do not need to actually carry out this process of multiplying up edges, because all 
we need to know to sample from the Poisson distribution is the mean, which is unaffected. 


Picking « Another problem is picking the € used to compute the sampling probability. Unfor- 
tunately, even after reworking the analysis, the constants are quite large. Even in the limit as n 
goes to infinity, we end up needing to pack and check 36 Inn trees (see below). We can get several 
trees that 2-respect (so that we do not have to check all the trees) by packing more trees, but our 
experience was that the time spent finding the trees was enough to make the running time several 
orders of magnitude worse than the other algorithms. We discovered, however, that on our test 
examples, finding only 6lnn trees and checking only 10 of them for 2-respecting cuts gave the 
right answer all the time. It is plausible that the analysis is not tight, but since we have not been 
able to tighten it, this implementation must be considered heuristic. There is no proof that it will 
be correct with the desired probability in all cases. This modification is in contrast to the other 
algorithms, where our heuristic changes did not affect correctness. 


For reference, we now give a reworking of Karger’s analysis [34] that gives the best constants 
we know how to get. Some readers will want to skip this section. 


Recall that we sample each edge independently with probability p, and we need to bound 
the probability that any non-minimum cut samples to less than (1 — e€}pA edges. For any given 
cut, we can easily argue such a result holds with polynomially small probability by application of 
Chernoff bounds, but there are exponentially many cuts, so a simple union bound will not work. 
Fortunately, there can only be a few small cuts, and the probability of a large cut deviating is 
smaller than that of a small cut. Balancing the size of the cuts against the number of them, we can 
manage to get a result. 


So this analysis depends on the number of small cuts in a graph. We refer to a cut as a- 
minimum if is has value at most «A. Unfortunately, while it is conjectured that there are only 
O(nl**!) x-minimum cuts, only special cases have been proved. We will use two of these pieces: 


Lemma 3.6.1 [34] There are at most n2% «-minimum cuts. 
Lemma 3.6.2 [32] For « < 3/2, there are at most 9n2 «-minimum cuts. 


We assume that the program will be given a parameter f, where we are supposed to succeed 
with probability at least 1 — 1/f, so we will use that parameter here. 


It turns out that the first 9n? cuts are the only ones of any concern. We will proceed by as- 
suming this fact, doing the analysis, and then justifying the assumption by showing that with the 
constants we computed the larger cuts contribute almost nothing. 


The small cuts are easy to analyze. Using a Chernoff bound on each of them and a union 
bound on the result, we get that 


Pr[one of smallest 9n cuts samples to < (1 — €)pA edges] < onze © PA/2 
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If we want this probability to be at most 1/2f, we get that 
e“pA < 4Inn + 2In18f 


Recall that we also need to know the probability that the minimum cut samples to too many 
edges, but from Chernoff bounds we immediately get 


Pr[minimum cut samples to > (1 + 5)pA edges] < eS pa/4 


If we want this probability to be at most 1/2f, we get that 
SpA < 41n 2 


Recall also that what we want is that twice the number of edges sampled from the minimum 
cut is less than thrice the minimum cut of the sample, so that some tree must 2-respect the mini- 
mum cut. Translated into the variables above, this statement reads as 


2(1+ 8) < 3{1—e) 


Solving, we find that if we take 
1 
2In 2f 
3+ \/ tintin Te? 
4lnn+21n 18f 
e*A 
then with probability at least 1 — 1/f we will get at least one tree that 2-respects the minimum cut. 


Note that the 4 in the second Chernoff bound above could be tightened a bit, but it only affects 
how close ¢ is to 1/3, so it does not matter much. 


ex 


and 


It now remains to show that the cuts we ignored really did not matter. For this purpose, order 
the cut values in increasing order, and denote them cj(= A), cz, c3,.... So we have already dealt 
with cj ...Co,2, and we are now concerned with the rest. By Lemma 3.6.2, Co,2 > 3A/2. Thus 


Prlany of con? ...C,3 samples to < (1 — e€)pAedges] < n3e (3 =)?3P/4 


Assuming n > f, we can assume that 1/4 < e < 1/3. Plugging in, we get that the probability 
above is at most e 19/4" — n-3., which is clearly negligible. 


We now must deal with the remaining cuts (c,3 ...). Using Lemma 3.6.1, we get that c,2« > 


aA. Rewriting, c, > TA. 


Pr[kth cut sample to < (1—e)}pAedges] < e ?7™K/4 —K-7/4 


Applying the union bound, we now need to consider 


i ie Ni yo2/4 = A 15/4 
k>n5 xan y 
Again, this quantity is clearly negligible. This concludes the reanalysis. 
As n goes to infinity, we get e = 1/3 and p = (36logn)/A. For f = 1/20 and n = 32768, which 
corresponds to the larger problems we tested on, we get « = .284 and p = 663/lambda. 663 is 
many more trees than is reasonable. 
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3.6.2 Tree Packing 


As discussed in the theory section, there are at least two completely different possibilities for pack- 
ing trees. One approach is to use Gabow’s algorithm to pack directed spanning trees; the other is 
to use the fractional packing algorithm of Plotkin, Shmoys and Tardos (PST) to pack undirected 
trees. 


Gabow’s Algorithm Our implementation of Gabow’s algorithm is mostly straight after the the- 
ory. The only significant heuristic we add is to greedily pack depth-first search trees at the begin- 
ning, switching to Gabow’s algorithm when we get stuck. Our experience is that this heuristic 
often finds the majority of the trees. We experimented briefly with reducing ¢€ (thus increasing 
the sample size), so that we could stop packing trees when we only had most of them, but we 
found that the increase in the number of trees caused by the decrease in € negated any benefit of 
terminating Gabow’s algorithm early. 


We also considered using the trees found by scan-first search as a heuristic, but we found them 
unsuitable. For one thing, scan-first search finds undirected trees, so it is typically a factor of two 
away from the optimum packing. Further, scan-first search is breadth-first in nature, and therefore 
tends to put many of one node’s edges in one tree, thus disconnecting the node unnecessarily, and 
immediately forcing a switch to Gabow’s algorithm. We also note that the fancy way to check for 
2-respecting cuts prefers “stringy” trees, such as depth-first search trees. Scan-first search trees 
may still be a good starting point for PST though. 


If we use the theoretically justifiable sampling probability, tree packing seems to be the bottle- 
neck. Unfortunately, the biggest problem we had was that Gabow’s algorithm seems to need an 
explicit representation of all of the trees, so as problem size increases we quickly run out of mem- 
ory. There are tricks that can be played to reduce the running time of Gabow’s algorithm, such as 
the divide and conquer variant proposed by Karger [34], but it seems that for Gabow’s algorithm 
to be practical, we need to either find a way to implicitly represent the trees, or we need to tighten 
the analysis of Karger’s algorithm so that we do not need to pack so many. As already mentioned, 
we ended up handling this problem by violating the analysis and declaring the implementation 
heuristic. 


Note that using Gabow’s algorithm to pack trees has the advantage that on integer capacity 
graphs with small minimum cut we can forget about random sampling and 2-respecting cuts and 
just use Gabow’s algorithm to find the minimum cut. 


PST Algorithm We have done preliminary experiments with PST, but they are inconclusive. 
Previous implementation work using PST to find multicommodity flows [44] found that heuristic 
changes to the algorithm were crucial to good performance. We do not feel that we have worked 
enough with PST yet to include results on it in this study. It would definitely be interesting to 
know how it performs. For reference, important issues appear to be selection of a starting point 
and on-line heuristic adjustment of parameters. 


It is easy to implicitly represent the trees in PST, so if the analysis of K cannot be tightened, we 
suspect that an implementation of K that respects the analysis will have to use PST. 
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Another major hope for PST is that it will in fact typically pack enough trees that we will only 
need to look for 1-respecting cuts. Saving the computation of 2-respecting cuts would improve 
practical performance a great deal. 


So we regret that we have not included PST in this study. As far as future implementation 
work on K goes, we consider PST to be deserving of the highest priority. 


3.6.3 Checking for 2-respecting cuts 


We implemented both the simple and the fancy methods for checking 2-respecting cuts. We con- 
jectured for a long time that dynamic trees would prove too complicated to be valuable, but it 
turns out that this is not the case. 


The Simple Way The simple method was implemented largely as the theory suggested. We 
combined the n computations of fi(w) into one tree traversal, and the n computations gh (v) into 
another. Another change was that we used an explicit test in the middle of the computation to 
handle the two cases (comparable, incomparable) instead of a fix at the end, as proposed by the 
theory. It is not clear that this change makes any difference. We do not bother to use a linear time 
algorithm for computing least common ancestors; rather we use the path compression algorithm 
of Tarjan [56], which is wonderfully simple and runs in O(ma(m,n)) time, where «(m,n) is a 
functional inverse of Ackermann’s function. 


A real problem with the method is that this simplest approach to it requires a table of size 
O(n’) to keep the values of all the cuts we are interested in as we computed them. This is clearly 
the simplest thing to do, but use of O(n*) space incurs a big penalty on sparse graphs. It would be 
interesting to find another way that is equally simple but space efficient. We eventually decided 
to select between the simple method and the fancy method on-line, based on the density of the 
graph. 


The Fancy Way We used an implementation of dynamic trees written by Tamas Badics [4] for 
the first DIMACS implementation challenge. This implementation uses splay trees, which is likely 
the most practical approach. 


We made some non-obvious changes from the theory description in implementing this method. 
These are not deeply significant, but we give them here for the sake of anyone who wants to im- 
plement K himself. Some readers will want to skip on to the experiments chapter. 


The theory suggests separating computations for comparable v and w from those for incom- 
parable v and w. The problem is that when looking for an incomparable partner we wish to add 
—2c(v,u) to neighbors u, and we want to find a w that is incomparable when we do the Min- 
Path operation. In theory, this suggests doing an AddPath(v, oo) first, so that no ancestor of v 
could possibly be the minimum. When we are looking for a comparable partner we wish to add 
+2c(v,u) for all neighbors u, and obviously we do want an ancestor as the answer. We resolve 
these problems in our implementation. First, for all edges we compute and store the least com- 
mon ancestor (LCA) of the endpoints. (Recall that we needed to compute them anyway to find 
1-respecting cuts.) Now instead of doing an AddPath(u, —2c(v, w)) for the incomparable case and 
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Figure 3.3: Adding c(v,u) up the tree. Along the dashed path we wish to add —2c(v,u), and 
along the thick path we wish to add 2c(v, wu). So we add —2c(v, u) from u to the root, and we add 
Ac(v, uw) from LCA(v, 1) to the root. 


separately an AddPath(u, 2c(v, w)) for the comparable case, we do an AddPath(u, —2c(v, w)) and 
an AddPath(LCA(u, v), 4c(v, w)). This puts the right values in the right places (see Figure 3.3). Fur- 
ther, since every node’s value is initialized with the value of the cut if its parent edge is cut, and 
we check 1-respecting cuts first, we only get a comparable w when looking for an incomparable 
one if there is not one that gives a better cut. 


Find2RespectingCuts(T ) 
initialize a dynamic tree T’ that represents T, and has val(v) = C(v) 


while T’ has more than one node 
ProcessBoughs(root of T’) 
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ProcessBoughs(v) 
if v has multiple children 
for each child w of v 
(partner, value) = ProcessBoughs(w} 
if (partner, value) is not NIL (w is top of bough) 
for all edges {v, u} (undo dynamic tree ops) 
AddPath(u, 2c{v, u)) 
AddPath(LCA(u, v), —4c(v, u)) 
contract(v, w) 
return NIL 
if v has one child w 
(partner, value) = ProcessBoughs(w} 
if (partner, value) is NIL, return NIL (v not on a bough) 
(partner’, value’) = ProcessNode(v) 
contract(v, w) 
if value’ < value, return (partner’, value’) 
else return (partner, value) 
else (v is a leaf) 
return ProcessNode(v) 


ProcessNode(v) 
for all edges {v, u} 
AddPath(u, —2c(v, u)) 
AddPath(LCA(u, v), 4c(v, u)) 
for all edges {v, u} 
x = MinPath(w) 
if val(x) + C(v!) <X 
X =val(x) + C(v) 
partner =x 
x = MinPath(w) 
if val(x) — C(v!) <X 
X =val(x) + C(v!) 
return (partner, A) 


Chapter 4 


Experiments 


In this chapter we discuss the experiments we carried out on the implementations described in the 
last chapter. We begin by describing the design of our experiments. We then discuss the results. 


4.1 Experiment Design 


The most important part of running experiments is the inputs that are tested. It is of course impos- 
sible to try everything; subjective choices were necessarily made in the design of our experiments. 
In this section we describe and justify those choices. We begin by laying out our goals, which 
guided these decisions. We then describe the families of inputs we chose, and give details on 
precisely which experiments we ran. 


4.1.1 Goals 


As stated in the introduction, our goal in this study is to obtain efficient implementations of all the 
algorithms and meaningful comparisons of their performance. Of course, it is not obvious how to 
define meaningful in this context. As an approximate definition, we adopt the following rules: 


1. running times on real-world problems are meaningful 
2. comparisons to previous work are meaningful 


3. running times on problems that expose weaknesses in the algorithms and/or implementa- 
tions are meaningful 


4. running times that differ by a small constant factor are not meaningful 


The justification for rule 1 is obvious. Performance on real-world problems is a direct measure 
of real-world performance. It is also clear that rule 2 is reasonable, as our results would be ques- 
tionable if they differed too dramatically from previous work. Rule 3 may seem objectionable, in 
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that the kinds of graphs that expose weaknesses may never come up in applications, but we main- 
tain that it is important to know just how bad performance is in the worst case. For example, if 
one implementation typically wins by a factor of five, but has a bad case where it loses by a factor 
of 10, one would probably be happy to use it and hope the bad case does not happen. However, 
if the bad case causes the implementation to lose by a factor of 10,000, one might be more hesitant 
about blindly hoping that the bad case does not occur. 


We justify rule 4 based on the fact that small factors come and go easily. In all likelihood, a 
good (and determined) programmer could speed up all of our implementations by a factor of two 
just by carefully optimizing the source code. Likewise, machine dependencies, such as cache size, 
are liable to have small effects that we might be able to fix, but our fixes might be unnecessary 
or even undesirable on another machine. We are not interested in such details. We hope to be 
able to recommend an algorithm to use and provide a starting implementation, but someone who 
is interested in the absolute best performance will have to (and probably want to) do the final 
optimization himself. 


Note that as a corollary of rule 4, we chose to look for minimum cut values, not the actual 
cuts. This decision simplifies the code a bit, and ensures that the implementations do not take a 
long time simply because they find many cuts. (Any of the algorithms can discover O(n) cuts; 
assuming it takes O(n) time to save a cut when found, the time spent saving cuts could be O( re), 
which might dominate the runtime.) It is easy to adapt our codes to actually find the minimum cut 
without affecting the running time by more than a factor of two: first find the minimum cut value, 
then run the algorithm again and stop when we first find a cut with the same value, outputting this 
cut. Since this modification can change the running time by at most a factor of two, we deemed it 
unnecessary to worry about it in our tests. 


4.1.2 Problem Families 


We chose several different families of inputs to cover the different types of tests we decided were 
meaningful. 


TSP TSP instances 

PRETSP Preprocessed. TSP instances 

NOI1-NOI6 | Random graphs with “heavy” components (after NOI) 
REG1-REG2 | Regular random graphs 


IRREG Irregular random graphs 

BIKEWHE _| Bicycle wheel graphs 

DBLCYC Two interleaved cycles 

PR1-PR8 Two components with a min-cut between them (after PR) 


Table 4.1: Summary of problem families. 
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Subproblems from a Traveling Salesman Problem Solver 


A state of the art method for solving Traveling Salesman Problem (TSP) instances exactly uses the 
technique of cutting planes. The set of feasible traveling salesman tours in a graph induces a convex 
polytope in a high-dimensional vector space. Cutting plan algorithms find the optimum tour by 
repeatedly solving a linear programming relaxation of an integer programming formulation of 
the TSP and adding linear inequalities that cut off undesirable parts of the polytope until the 
optimum solution to the relaxed problem is integral. One set of inequalities that has been very 
useful is subtour elimination constraints, first introduced by Dantzig, Fulkerson, and Johnson [15]. 
The problem of identifying a subtour elimination constraint can be rephrased as the problem of 
finding a minimum cut in a graph with real-valued edge weights. Thus, cutting plane algorithms 
for the traveling salesman problem must solve a large number of minimum cut problems (see [43] 
for a survey of the area). We obtained some of the minimum cut instances that were solved by 
Applegate and Cook [3] in their TSP solver. These are clearly desirable test data, as they are from 
a “real-world” application. 


The Padberg-Rinaldi heuristics are very effective on the TSP instances. In order to factor out 
the time spent in preprocessing, for each TSP instance we made a smaller instance by running PR 
passes until some pass fails to do any contractions. (Note that running PR passes until one pass 
fails is not the same as exhaustively applying the PR tests.) We refer to these reduced instances 
as PRETSP. We tested the implementation on both the original TSP problems and the PRETSP 
problems. 


Table 4.2 gives a summary of these instances, including their “names” which correspond to 
the original TSP problems. 


Note that these problems are smaller than we would have liked. Several PRETSP instances 
have only two nodes, and the largest PRETSP instance has only 607 nodes. The running times of 
the best algorithms are therefore small, making it hard for us to really distinguish them. We were 
unable to obtain larger instances; remember that finding a minimum cut is only a subroutine of a 
TSP solver, and the whole algorithm apparently takes too long for TSP researchers to be running 
on much larger graphs. 


Random Graphs with “Heavy” Components 


A natural type of graph on which to run a minimum cut algorithm is the type that is always 
drawn to exhibit the problem: two well connected components connected by low capacity edges. 
Nagamochi et al. [48] used a family of graphs that generalizes this idea. We use the same family, 
which is parameterized as follows: 


n the number of vertices in the graph 


d the density of edges as a percent (i.e. m = 725-4) 


k the number of “heavy” (well-connected) components 


P the scale between intercomponent and intracomponent edges 


64 CHAPTER 4. EXPERIMENTS 


5 tspag00x1 [1400 | 2931 148 [300 | 
6 tsprlt23x1 [1323 | 2169 [as | 22 | 
8 tsprib34x1 [5934 | 7287 [150 | 292 | 
9 isp rI5934x2 [5934 | 7627 [261 | BIZ | 
16} pa85900-x0.102988 | 85900 | 102988 | 52 [90 


Table 4.2: Summary of TSP and PRETSP instances. n and 1’ is the number of nodes in the TSP 
and PRETSP instances, respectively. m and m’ are the corresponding numbers of edges. 
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The graph is constructed by first taking n vertices and randomly coloring them with k colors. We 
then add one random cycle on all vertices, so the graph will be connected, and add the remaining 
m—n edges at random. Every edge added gets a random capacity. If the endpoints have different 
colors, the capacity is chosen uniformly at random from [1, 100]; otherwise the capacity is chosen 
uniformly at random from [1, 100P]. 


Following [48], we tested on 6 subfamilies. Our families are the same in spirit as those of [48], 
but we use larger problem sizes and we added some data points where we felt it was appropriate. 


A a 

NOTI1 300,400,500,600 50 1 300,400,500,600, 

RON [rososeoc0| |_| ant .00_| 

NOI2 300,400,500,600 50 2 300,400,500,600, 

OP [peosonseoc0| |_| rato 0.100_| 
50 


NOI5 1000 1,2,3,5,7,10,20,30,33,35 1000 


40,50,100,200,300,400,500 


NOI6 1000 50 2 5000,2000,1000,500 
250,100,50,10,1 


Families NOI1 and NOI2 study the effect of varying the number of vertices. Families NOI3 
and NOI4 study the effect of varying the density of the graph. Family NOI5 studies the effect 
of varying the number of components, and family NOI6 studies the effect of varying the ratio 
between the weights of the intercomponent and intracomponent edges. 


Regular Random Graphs 


Recall that in the analysis of the original KS, we lower bound m with An/2. It follows that a graph 
where this bound is tight is liable to be an interesting graph for KS. If uncapacitated, such a graph 
is interesting in general, because it has the minimum number of edges possible given its minimum 
cut value. (An uncapacitated graph must have A edges incident to every vertex, since otherwise 
some vertex defines a smaller cut.) Recall also that one of the heuristics in NI computes lower 
bounds on cut values between edge endpoints, and can lead to many contractions in one phase. It 
makes sense that a graph that has as few edges as possible might cause this heuristic to fail. 


We achieve this extreme case with a A-regular graph. In particular, we take the union of A/2 
random cycles. Preliminary experiments with a union of A random matchings gave similar results. 


Family 


REG1 | 1000, 2000, 4000, 8000, 16000 | 8 16 32 64 128 256 
1000 512 1024 


REG2 128, 256, 512, 1024, 2048 
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REGI tests the effect of varying n and A on sparse graphs. REG2 tests the effect of varying n 
on dense graphs. 


Irregular Random Graphs 


An obvious question about the previous family is what happens if symmetry is broken a little bit. 
In particular, we consider taking the union of A random matchings (or cycles), and then adding 
some edges of another random matching. It is not obvious that this family will produce different 
results, but it turns out that NI changes behavior in interesting ways. We add another parameter 
e, the number of extra edges, to the parameters from the REG families. 


EE 


IRREG | 4000 0, 2, 4, 16, 64, 256, 1024, 2000, 2976, 3744, 3936, 3984, 3996, 3998, 4000 


Bicycle Wheels 


Another extreme graph is a cycle. An uncapacitated cycle has (5) minimum cuts, a value that 


matches the upper bound, and nA/2 edges, a value that matches the lower bound. A cycle also 
has only one undirected spanning tree, despite having minimum cut value two, so it exhibits the 
extreme case for the size of a tree packing. 


Unfortunately, PR2 applies at every vertex in a cycle, so PR preprocessing always solves cycles. 
One natural way to try to overcome this problem is to make a “wagon wheel” instead. That is, add 
an extra vertex that is connected to every vertex on the cycle. If the capacities of the new edges 
are small compared to the cycle edges, then the graph is still very much like a cycle in terms of its 
cuts, but PR2 no long applies. Unfortunately, now PR3 applies at every vertex. So we go one step 
further: we take a cycle and add two extra vertices, one connected to every other node, and the 
other connected to the remaining nodes. We also connect the two added vertices. We refer to this 
graph as a bicycle wheel, as that is precisely what it looks like. (See Figure 4.1.2). Note that this 
graph is now immune to all the PR tests. (In fact, it is the example of a PR immune graph we gave 
in Section 2.2.1.) 


We pick the capacities so that all trivial cuts have the same value. This choice causes the “rim” 
to have large capacity, and the “spokes” to have small capacity, which means that the cuts are still 
very much like those of a cycle. The only parameter then is the number of vertices, so that is all 
we Vaty. 


BIKEWHE | 1024, 2048, 4096, 8192, 16384, 32768 
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Figure 4.1: A 10-vertex bicycle wheel graph. 


Two Interleaved Cycles 


Another way to get a graph that is basically a cycle, but is immune to the PR tests, is to use two 
cycles. For this family we use an n-node cycle with capacity 1000, and we make a second cycle by 
connecting every third node of the original cycle with a unit capacity edge. 


In order to make this family a little more interesting, we also “hide” a minimum cut in the 
middle. That is, we take two opposite cycle arcs and decrease their capacity by three. We then 
increase the capacity of four of the second cycles’ edges by three, such that all trivial cuts still have 
value 2002. In the process, however, we have created a cut of value 2000. Note there are also 
Q(n?) cuts of value 2006. This modification is cumbersome to describe in words, but the picture 
is clear. See Figure 4.2. 


DBLCYC | 1024, 2048, 4096, 8192, 16384, 32768 


PR 


One final problem family is one used by Padberg and Rinaldi [51]. Our only use of this family is 
to check the effectiveness of our PR strategies against those of Padberg and Rinaldi. 


This family includes two different types of graphs. The first type is a random graph with an 
expected density d. The second type is a random graph that consists of two components connected 
by “heavy” edges, with “light” edges going between the components, thus the minimum cut is 
very likely to separate the two components (similar to the NOI families). The generator takes three 
parameters 


e n- the number of vertices, 


e d- the density (as a percentage), 
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Figure 4.2: Two interleaved cycles. The outer cycle edges have capacity 1000, except for the two 
thin ones, have which capacity 997. The inner edges have capacity 1, except for the 4 thick ones, 
which have capacity 4. The dashed line shows the minimum cut (value 2000). The dotted line 
shows one of many near minimum cuts (value 2006). 
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e c- the type of graph to generate (1 or 2). 


If c = 1, for each pair of vertices, with probability d, we include an edge with weight uniformly 
distributed in [1, 100]. If c = 2, we split the graph into two components, one containing vertices | 
through n/2 and the other containing vertices n/2 + 1 through n. Again, for each pair of vertices, 
we include an edge with probability d. If the two vertices are in the same component, the edge 
weight is chosen uniformly from [1, 100n], but if the vertices are in different components, the edge 
weight is chosen uniformly from [1, 100]. 


Femiy[—Ss—“i‘i SSC*C“‘CS™C‘*CN 
PRA 


As we wish to compare directly to Padberg and Rinaldi, these values are precisely those used 
by Padberg and Rinaldi in their paper [51]. 


4.1.3. Codes 


As discussed in the previous chapter, for each algorithm we made numerous decisions about the 
implementation. While the process of implementation involved testing many of these decisions, 
there are far too many for us to attempt to present data on everything we tried. We picked several 
important variations on which to report data. See Table 4.1.3 for a summary. 


NI without PR heuristics 


Table 4.3: Summary of the implementations we tested 
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Notice that the suffix _nopr does not mean the same thing in all cases. For HO and NI we 
show what happens when PR heuristics are disabled entirely, whereas for KS we never consider 
disabling PR preprocessing, and for K we never disable the PR heuristics at all. This decision is 
based on experience with the PR tests. 


It turns out that no one algorithm is clearly best, and we did attempt to make a hybrid algo- 
rithm that would fill this role. We can give a good idea of what such an implementation would 
look like based on the data we got from the implementations above. 


4.1.4 Setup 


Our experiments were conducted on a machine with a 200MHz PentiumPro processor, 128M of 
RAM, and a 256K cache. Our codes are written in C and compiled with the GNU C compiler (gcc) 
using the 04 optimization option. All of the Monte Carlo implementations (ks, ks nopr, k) used 
95% as a minimum success probability. 


We averaged five runs wherever randomness was involved. That is, for the problem families 
that are constructed randomly, we constructed five instances for each setting of the parameters. 
Further, for the randomized algorithms, we did five runs on each instance. We report averages. 


As mentioned in Section 4.1.1, our implementations do not actually output the minimum cut, 
or save the minimum cut in any special data structure. However, at the time the minimum cut is 
encountered they do have the minimum cut stored in some internal data structure from which it 
could easily be extracted. 


In order to see why different implementations had different performances, we recorded many 
quantities in addition to total running time: 


For all implementations we measured: 


total running time not including time to input the graph 


discovery time the time at which the algorithm first encountered the minimum cut. This 
quantity tells us two things. First, if we use the two pass method to get the actual cut, 
as described in Section 4.1.1, the discovery time will be the running time of the second 
pass. So running time plus discovery time should be the time to find and output a min- 
imum cut. Second, for KS, discovery times tells us how many iterations of the recursive 
contraction algorithm we actually needed to run. If discovery times are always far less 
than running times, we might suspect that the analysis is not tight. 


edge scans the number of times an edge was examined. Examining edges is a basic unit 
of work that all the algorithms perform. Hence this quantity provides some sort of 
machine independent measure of running time. This is a basic unit of work that is 
common to all of the codes. 


For implementations that perform PR tests we measured: 


preprocessing time the time spent preprocessing the graph with PR tests. 


initial PR contractions the number of contractions done by the PR preprocessing. 
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internal PR contractions the number of contractions due to PR tests while running the main 
algorithm. 


For HO implementations we measured: 


s-t cuts the number of s-t cut (max-flow) computations, not counting single node layers. 
average problem size the average number of vertices in an s-t cut problem. 


one node layers the number of times a sleeping layer has exactly one node. (Recall that we 
process this case specially.) 


excess contractions the number of contractions due to the excess detection heuristic. 
For NI implementations we measured: 

phases the number of scan-first searches executed. 
For KS implementations we measured: 

leaves the number of leaves of the recursion tree. 
For K implementations we measured: 


packing time the amount of time spent packing trees. 


respect time the amount of time spent checking for 1 and 2-respecting cuts. 


4.2 Results 


In this section we discuss our results. The overall result is that ho and ni are best, although each 
has a bad case, and on bicycle wheels they both lose asymptotically to k and ks. We give more 
details by first discussing the results on each problem family, and then discussing each algorithm. 


In this section we present most of the data in the form of plots. Full data appears in tabular 
form in the Appendix. The plots always have log(running time) as the vertical axis. For families 
where we are varying the size or density of the graph, we also use a logarithmic scale for the hor- 
izontal axis. Since we expect the running time of the algorithms to be expressible as qn°?, log-log 
plots are appropriate: the y-intercept tells us c; and the slope tells us cz. So parallel lines corre- 
spond to algorithms with the same asymptotic performance and different constant factors, and 
different slopes correspond to different asymptotic performance. Where appropriate we use a lin- 
ear regression to compute the slopes and intercept of the best fit line and report these performance 
functions in a table. 


Note that our timer was not precise below 0.01 seconds, and we cannot plot 0.00 on a log 
scale, so any timer results of 0.00 were translated to 0.01 for plotting purposes. Such small values 
probably should not be trusted in any case. 
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Figure 4.3: All implementations on the TSP instances. Note that the x-axis of this plot has no 
meaning; the points are connected by lines because the lines seem to make it easier to read the 


plot. 
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4.2.1 Results by Problem Family 
TSP 


The TSP family turned out to be a study in the effectiveness and subtlety of PR tests. The most 
striking result here is the difference between hybrid and ni, which for the “USA” instances (17-32) 
is approximately a factor of 1000 (see Figure 4.3) . This difference is almost entirely due to the PR 
strategy. Recall that hybrid does have PR tests, yet it behaves like ni_nopr. Most of the difference 
is our PR preprocessing, which reduces the size of the USA problems by a factor of 10 to 15 in 
about a tenth of a second. However, if we factor out this difference by preprocessing the instances 
before running the codes on them (Figure 4.4), we find that ni is still gaining something over both 
ni_nopr and hybrid, so our internal PR strategy is also gaining us something. 


PRETSP 
/ ae iS ane 
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Figure 4.4: NI variants on preprocessed TSP USA instances (PRETSP 17-32). 


NOI 


Overall, random graphs with heavy components serve as a demonstration of the good case for 
NI. These graphs have many “extra” edges, and as one would hope, all of the implementations 
of NI were able to exploit this property to run in near-linear time. Further, using Matula’s ap- 
proximation algorithm to get a good cut upper bound and computing a sparse certificate based 
on that value was sufficient to solve almost every instance, so the behavior of k on almost all of 
these problems is the behavior of this preprocessing step. Notice that k is still several (roughly 4) 
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Figure 4.5: All implementations on random graphs (varying size). 
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Figure 4.6: All implementations on random graphs with 2 heavy components (varying size). 
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times slower than ni. There are two reasons for this. First, since the contractions done by Matula’s 
algorithm must be undone, k has extra overhead in the contraction code to allow contractions to 
be undone. Second, k must always do at least two sparse certificate computations: at least one for 
Matula’s algorithm and then the one that uses the value computed by Matula’s algorithm. ni, on 
the other hand, typically needs only one sparse certificate computation for these graphs. 
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Figure 4.7: All implementations on random graphs (varying density). 


Varying the number of nodes and the density of the graphs, all the implementations behave 
with similar asymptotics (Table 4.4 and Figures 4.5-4.8). In fact, only KS distinguishes itself, and 
that is for having significantly worse constant factors. Actually, KS’s analysis is failing it on these 
problems. The variant was designed with graphs of this nature in mind, and indeed ks typically 
ends up with very shallow recursion trees. However, while it appears that the success probability 
here should be constant, we could not find the right way to determine that fact on-line. The 
problem is that we cannot just look at the depth of the trees and use the actual depth to revise 
our estimate of success probability, because if we condition on the fact that a recursion tree has 
small depth, we find that the probability we have contracted a minimum cut edge increases. So 
all we can say is that on these families ks actually performs quite well, but we do not know how 
to recognize this fact on-line and terminate early. 


Varying the number of components is more interesting (Figure 4.9). For one thing, there is a 
threshold value, after which the PR preprocessing solves the problem. This threshold appears to 
occur when the intercomponent cuts become as large as the the intracomponent cuts. Note that 
we can see the slowdown k has in the contraction code, because after we cross the threshold it 
runs a constant factor slower than the other implementations with PR preprocessing. We also see 
that excess detection can sometimes fill in for PR tests, as ho_nopr improves performance at the 
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Figure 4.8: All implementations on random graphs with 2 heavy components (varying density). 
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Table 4.4: Asymptotic behavior of the implementations on graphs with heavy components. 
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Figure 4.9: All implementations on random graphs with heavy components (varying number of 
components). 


same threshold, whereas ho_noprxs does not change behavior. 


Another interesting aspect of varying the number of components is that for five and seven 
components, preprocessing with Matula’s algorithm and one sparse certificate computation does 
not solve the problem. For all other numbers of components it does. Notice that in the two cases 
where it has to do some work, k reveals that its performance on these graphs is better than ks, but 
not very good. 


Finally, varying the capacity of the intracomponent edges (Figure 4.10), we see that for very 
high values, most of the implementations improve performance. They all seem to improve at 
the same point, but for very different reasons. For ks, the improvement comes because the trees 
become very shallow when there is so much extra edge capacity to be picked for contraction. For 
ni, fewer sparse certificate computations are necessary, as it finds more excess edges to contract. 
For ho, excess detection quickly causes most of the vertices to be contracted away. 


REG 


Regular random graphs (Tables 4.5 and 4.6, Figures 4.11-4.13) are the bad case for NI. In fact, they 
induce NI’s worst case O{mn) time performance. This fact is immediately apparent on both sparse 
and dense graphs. The only place where any NI implementation manages to perform well is when 
the graphs are very dense, and the PR tests kick in. 
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Figure 4.10: All implementations on random graphs with heavy components (varying “heaviness” 
of components). 
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Figure 4.11: All implementations on sparse regular random graphs (varying density). 
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Figure 4.12: All implementations on sparse regular random graphs (varying size). 
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Figure 4.13: All implementations on dense regular random graphs. 
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Table 4.5: Asymptotic behavior of the implementations on sparse regular random graphs. (These 
fits do not include the unusual cases n = 1000, d = 256,512.) 
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Table 4.6: Asymptotic behavior of the implementations on dense regular random graphs. 
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These graphs are also sufficiently sparse that k just runs Gabow’s minimum cut algorithm. 
Gabow’s algorithm is apparently competitive with ho when the cut value is very small, but as the 
value increases performance quickly degrades. The transition away from Gabow’s algorithm is 
the reason for the sudden change in behavior that can be seen in Figure 4.11 when the graph gets 
dense. We were unable to run large enough problems to compare k and ho for the case where the 
graph is sparse, but has at least O(log n) cycles. 


Note that k’s terrible asymptotic performance on the dense instances is an artifact of small 
problems. For these instances, the number of nodes is small enough that even though they have 
O(n?) edges, k is deciding that they are sparse enough to use Gabow’s algorithm on. The n = 2048 
case does not quite fall on the fit line in Figure 4.13, because it is the first instance where Gabow’s 
algorithm is not used. The asymptotic performance will improve after this point. 
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Figure 4.14: All implementations on irregular random graphs. 


Irregular random graphs (Figure 4.14) mainly show the significant dependence NI has on graph 
regularity. The performance of all implementations of NI tend to do poorly when the input is very 
regular, as in the regular random graphs, whereas they all do well when the input is irregular, 
as in the random graphs with heavy components. Intuitively, this behavior is in accord with the 
nature of the algorithm: irregular graphs have “extra” edges that will not be in a sparse certificate 
and therefore be contracted. This family shows just how dramatic the difference is. 


It is also interesting that excess detection appears to be more effective when the graph has 
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extra edges, and the PR tests are not. 
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Figure 4.15: All implementations on bicycle wheel graphs. 


The bicycle wheel graphs (Figure 4.15 and Table 4.7) are particularly interesting because they are 
the only example we have where both deterministic algorithms lose asymptotically. Unfortu- 
nately, because k needs O(n log n) space, 128M of RAM was not sufficient memory for us to run 
a big enough instance to see k win. It appears that the crossover would take place right near the 
edge of the plot though. 


DBLCYC 


Two interleaved cycles (Figure 4.15 and Table 4.7) have interesting properties with respect to the 
PR tests. The graphs have O(n*) near-minimum cuts, and random choices of edges to contract 
will not distinguish very near minimum cuts from minimum cuts, so KS should find them all. 
However, the PR tests can tell the difference, and this makes for a huge difference between ks and 
ks_nopr. ks runs quite well on these graphs, whereas ks_nopr runs so badly that we had to run it 
on some smaller instances in order to get any idea of how it behaved. 


Surprisingly, similar behavior occurs in the implementations of NI. ni does a few sparse certifi- 
cate computations, and then the PR tests finish off the graph. hybrid’s PR strategy is not nearly as 
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Table 4.7: Asymptotic behavior of the implementations on bicycle wheel graphs. 
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Figure 4.16: All implementations on two interleaved cycles. 


84 CHAPTER 4. EXPERIMENTS 
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Table 4.8: Asymptotic behavior of the implementations on two interleaved cycles. 


effective, allowing roughly n/3 sparse certificates to be computed before finishing off the graph. 
This difference leads to a significant difference in asymptotic performance. 


The second surprise about this family is that all of the implementations of HO perform badly. 
This family is the only one where ho does so badly. (Even on bicycle wheels, where ho was dis- 
tinctly losing to K asymptotically, it was still winning for all of the problem sizes we ran.) Note 
that it is not obvious how to get ni’s good performance here by some kind of sparsification pre- 
processing. ni takes a few certificate computations, so repeating sparse certificate computations 
while they help would not work here. Further, using Matula’s 2 + € approximation and using the 
resulting cut bound to compute a sparse certificate (as k does) reduces the problem but does not 
solve it. 


So it appears that this family has the property that the source PR tests do not kick in for a long 
time, whereas PR passes seem to help very soon. It would be nice to implement PR passes in ho to 
verify this conjecture. (Note that implementing PR passes in HO is non-trivial, because a PR pass 
may invalidate the distance labeling.) 


PR 


Our only goal in running these tests used by Padberg and Rinaldi was to compare our PR strategy 
to theirs. We refer to their code as pr, and look at the number of s-t cuts computed. Since our 
implementations contract nodes via other heuristics, we look at all of the variants (Table 4.2.1). 


It is somewhat difficult to make a meaningful comparison here because of HO’s other heuris- 
tics. Looking just at the number of s-t cuts computed by ho, it appears that we cannot hope to 
do much better. However, ho_noxs shows that in some cases we are not getting as much out of 
the PR tests as pr. It remains unclear whether we should be concerned about this fact or not. The 
question is whether there are times when we would miss PR tests and excess detection would not 
make up the difference. It is possible that two interleaved cycles are such a case, but we have yet 
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Table 4.9: Number of s-t cuts performed by HO implementations on PR graphs. 
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to verify that. 


4.2.2 Results by Algorithm 


We now shift our focus and discuss the algorithms individually. We also discuss the PR tests, as 
we feel that they deserve further discussion. 


The Padberg-Rinaldi Heuristics 


The PR heuristics are powerful, but subtle. In general, our pass strategy appears to be a good way 
to apply the tests. It never significantly slows down the implementation, and it often results in 
substantial improvements. Preprocessing with PR passes practically trivializes the TSP instances, 
and solves cycles, and wagon wheels, and random graphs with many heavy components. Passes 
also significantly improve the asymptotic behavior of ni and ks on two interleaved cycles. 


We conclude that omitting PR tests from an implementation of a minimum cut algorithm 
would be a serious mistake. 


The Hao-Orlin Algorithm 


HO appears to be the best general purpose algorithm. It loses asymptotically on bicycle wheels, 
but it has the best performance for the size graphs we ran. The only bad case is two interleaved 
cycles, where it loses to all the other algorithms. Otherwise, it performs very well. 


Notice that we do not have any dense families on which ho exhibits its worst case O(re/m) 
time behavior. The worst we see is Q(n*) behavior on bicycle wheels and two interleaved cy- 
cles. In the maximum flow context, there are parameterized worst case instances. However, these 
instances have many degree two nodes, which PR2 would promptly remove, so we did not try 
them. It would be nice to either find a worst case family or prove that the PR tests improve HO’s 
time bounds. 


In general, ho is so fast on its own that the heuristics do not help a great deal. On the TSP in- 
stances, PR preprocessing gains an order of magnitude, but that is the most extreme example. The 
heuristics also “compete” with each other for contractions. Comparing ho, ha.noxs, and ha nopr, 
we see that they all do similar numbers of heuristic contractions. Both the PR tests and excess 
detection can be responsible for many heuristic contractions, but when we put them together 
a contraction done by one typically means one less done by the other. Further, without either 
heuristic, we get many one node layers, so the “flow computation” we are saving is often trivial. 


It is valuable to do PR preprocessing, but the value of internal PR tests is unclear. One direction 
we see that should be explored is the idea of doing PR passes periodically. Such a strategy may 
improve ho’s bad performance on two interleaved cycles. 


The behavior of excess detection is interesting. Both NOI6 and IRREG suggest that excess 
detection is good at identifying extra edges, in a way that the PR tests are not. It would be nice to 
establish a stronger statement on the relationship between excess detection and NI. 
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The Nagamochi-Ibaraki Algorithm 


Results on implementations of NI are mixed. When ni works well, it works very well, but some- 
times it does show its worst case behavior. In the long run, the sparse certificate computation 
may be more valuable than the whole algorithm. That is, the good case for ni is when the sparse 
certificate exhibits many extra edges, and this computation can be done as a preprocessing step 
for any other algorithm. Such a strategy would take advantage of ni’s performance in the good 
cases and resort to another algorithm for the bad ones. The inclusion of this strategy is partially 
responsible for the reasonable performance of k. 


ni compares favorably with hybrid. Our implementation never loses by much, and sometimes 
our improved PR strategy gives us much better performance. In particular, we win by about 
three orders of magnitude on TSP instances, and we have better (by a factor of n) asymptotic 
performance on two interleaved cycles. 


The Karger-Stein Algorithm 


ks seems to be inferior to the other implementations, but it is not clear that this statement can be 
generalized to KS. We have no problem families on which ks wins, but on bicycle wheels it only 
loses to k, which is cheating on its probabilities. 


For many families, it is not clear that the internal PR tests help a great deal, but they help 
asymptotically on two interleaved cycles, so we conclude that they are valuable. Note PR tests in 
general help less in ks than in other codes, because we may have to undo them. That is, when we 
apply PR tests after random contractions, they may pass only because we have already contracted 
a minimum cut edge, so when we undo the random contractions (backing out of the recursion), 
we must undo the PR contractions as well. As a result, ks often does more PR contractions than 
there are nodes in the graph. 


The frustrating thing about implementing KS is its Monte Carlo nature—correctness depends 
on the analysis, and tight analysis is difficult. Our evidence on tightness is mixed. For several 
random graphs with heavy components, it occasionally happens that the discovery time is a sub- 
stantial fraction (more than 20%) of the post-preprocessing time, which says that if we ran many 
fewer recursion trees we would get the wrong answer. However, we only picked the constants to 
guarantee a 95% success probability, so we should expect to get the wrong answer occasionally. 
Even on the problems where we see high discovery times, most of the discovery times are still 
small. So there is no hope of improving ks by many orders of magnitude, but improvement by as 
much as one order of magnitude looks plausible. 


The other evidence that ks can improve is the fact that the trees are almost always far shallower 
than the theory predicts, which suggests that the probability of finding the minimum cut is higher 
than the theory estimates. However, even though the theory analysis gives us an estimate of 
success probability based on depth, we cannot estimate the probability after looking at the depth 
of the tree, because conditioning on a shallow depth, it becomes more likely that we contracted a 
minimum cut edge. This state is particularly frustrating because the PR tests, which do not make 
mistakes, are liable to be responsible for the shallow depth, but we do not know how to tell the 
difference on-line. 
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It seems that there should be a way to estimate the probability of success on-line, based on 
what the algorithm has done, and get tighter estimates than the off-line analysis would give, but 
we have not been able to find it. 


A heuristic that might help ks, but that we did not get a chance to try, is to do a sparse certifi- 
cate computation after the random contractions. If the random contractions tend to create graphs 
with extra edges, this strategy may help to reduce the depth of the recursion. 


Another possible heuristic that should be explored is deliberately overestimating the mini- 
mum cut so that the success probability of each recursion tree increases. Preliminary experiments 
with this idea suggested that it does speed up the implementation, but careful analysis of the 
running time is needed to make sure that such an implementation will not have termination prob- 
lems. 


Karger’s Algorithm 


Results on K remain inconclusive. The performance of k is not amazing, but it is not terrible either. 
Of course, since we cheated on the probabilities, it is not clear that timings on k mean anything, 
but since k never got the wrong answer, we believe that they do. It would be nice to either find a 
problem family that demonstrates the tightness of the theory results or improve the bound. 


Note that two interleaved cycles were originally designed as an attempt to stress the sampling 
probabilities. They have one minimum cut and O(n?) near minimum cuts, so they should be 
in danger of generating a sampled graph in which a tree packing might have no trees that the 
minimum cut 2-respects. However, in 70,000 runs on a 1000 node double cycle, we always found 
a tree that the minimum cut 2-respected. The reason for success, though, seems to be that when 
the sampled graph looks roughly like a cycle, there is little question that at least one tree will be 
path going around it. We are not sure if this observation can be exploited at all to tighten the 
analysis. 


Tree packing is often the bottleneck, so it would also be valuable to work more on improving 
Gabow’s algorithm. With a theoretically justifiable sampling probability, the problem we have is 
that Gabow’s algorithm needs to explicitly represent the trees, and there are too many. Perhaps 
some kind of scaling approach could work around this problem. Karger’s divide and conquer 
approach [34], which improves the asymptotic running time to O(VAm log n), might also improve 
performance. 


The major implementation question that remains to be answered is what happens if we use the 
fractional packing algorithm of Plotkin, Shmoys, and Tardos to compute the tree packings instead 
of Gabow’s algorithm. There are two reasons why PST might be better. First, it does not need 
an explicit representation of the trees, so it will demand less memory, and may allow us to use 
theoretically justifiable sampling probabilities. Second, it is possible that it will often find more 
than the minimum number of trees, in which case we would be able to check only for 1-respecting 
cuts. 


Chapter 5 


Conclusion 


Our study produced several efficient implementations of minimum cut algorithms, improving the 
previous state of the art. We introduced new strategies for improving performance, and we give 
several problem families on which future implementations can be tested. 


Our tests show that no single algorithm dominates the others. ho and ni typically dominate 
ks and k, but on bicycle wheels, asymptotically the reverse is true. ho and ni are hard to compare 
directly; on regular random graphs ho does well and ni does poorly, and on two interleaved cycles 
ni does well and ho does poorly. For general purposes we would recommend ho, as it has such 
small constant factors that even when it performs “badly”, it does pretty well. After ho we would 
recommend ni. 


Our results confirm the importance of the Padberg-Rinaldi heuristics. In some cases they im- 
prove the practical performance by several orders of magnitude, and in other cases they clearly 
improve the asymptotic performance of an implementation. We conclude that omitting PR tests 
from an implementation of a minimum cut algorithm would be a serious mistake. It is said that 
implementations using graph contraction are usually difficult to code (see e.g. [28]) and may be 
inefficient, but the gains of the Padberg-Rinaldi heuristics easily make contraction worth imple- 
menting. 


There are several possible directions for future work. On the implementation side, there are 
a few possibilities that should be explored. First, further experiments on using PR tests in ho 
should be done. In particular, we would like to know if PR passes will cure the bad behavior on 
two interleaved cycles. Second, some experiments on adding sparse certificate computations to 
the other algorithms should be done. Sparse certificates might help ho a little bit, and it would be 
interesting to see if they help inside the recursion of ks. Finally, PST tree packing should be tried in 
k. This change might make it feasible for k to use theoretically justifiable sampling probabilities, 
as well as possibly improving performance. It would also be nice to try to improve Gabow’s 
algorithm, possibly with some further heuristics. 


There are also some open questions on the theory side. First, either a graph that causes worst- 
case behavior in HO should be found, or it should be proved that the addition of heuristics actually 
improves the time bounds. Second, both KS and K would benefit from more theory work. Since 
they are Monte Carlo in nature, we can only guarantee “correctness” by using the theoretical 
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analysis, and we do not believe that the constant factors of the analysis are tight in either case. 
The idea of overestimating the minimum cut in KS should be explored, and it would be nice to get 
an analysis of KS that allowed for a more on-line estimation of success probabilities. 
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